HydrOffice news

Dealing with ASCII data formats

Feb. 10, 2015, 6:10 p.m., author: giumas
 

Aim

 
Huddl has been developed with binary data formats as a primary target. However, the resulting language is quite flexible, and that can be also used to access ASCII data format. To demonstrate this flexibility, this post shows the data manipulation features of a Huddl-based module to read the Hypack HSX data format. These are the first lines of some test survey data stored in such a data format:
FTP NEW 2
HSX 6
TND 08:53:08 02/27/2008
INF "J. Doe" "RV Quicky" "Full CVRG" "OpAREA" 0.72 0.00 1423.11
HSP 0.0 70.0 105.0 105.0 65 65 3 1 1.6 0.0 1 1
DEV 0 20 "Hypack Navigation"
DV2 0 14 0 1
OF2 0 0 -5.1 -3.9 -22.3 0.00 0.00 0.00 0.80
PRI 0
DEV 1 32784 "Reson Seabat 81xx (Network)"
DV2 1 9 0 1
OF2 1 3 -4.3 0.0 2.7 3.00 0.20 -2.00 0.00
MBI 1 1 0 1801 101 0 75.000 -1.500
SSI 1 0 1000 1000
DEV 2 32 "NMEA-0183 Gyro"
DV2 2 20 0 1
OF2 2 1 0.0 0.0 0.0 0.00 0.00 0.00 0.00
DEV 3 512 "TSS DMS"
DV2 3 200 0 1
OF2 3 2 0.0 0.0 -2.9 0.00 -0.40 0.85 0.00
...
 

Python script

 
The Huddl-based Python module is imported at the first line. Then, several third-party and standard libraries are also imported, mainly for data visualization and handy storage.
In [1]:
#!/usr/local/bin/python
import hypack_hsx as hsx
import os.path
import datetime
import operator
import numpy as np
import pandas as pd
import matplotlib
from matplotlib import pyplot as plt
from matplotlib import rcParams
from matplotlib import ticker
from matplotlib import colors
 
Some simple helper classes are created to store and manage the data content:
In [2]:
class HsxInfo:
    """ container for TND, INF topblocks """
    dt = None
    date = None
    surveyor = str()
    boat = str()
    project = str()
    area = str()

class HsxAttitude:
    """ container for Hypack attitude data """
    time = np.zeros(0, dtype='datetime64[us]')
    roll = np.zeros(0, dtype=np.float)
    pitch = np.zeros(0, dtype=np.float)
    heave = np.zeros(0, dtype=np.float)

class HsxMbPing:
    """ container for Hypack mbes ping data """
    slr = np.zeros(0, dtype=np.float)
    lvl = np.zeros(0, dtype=np.int)
    qlt = np.zeros(0, dtype=np.int)

class HsxMbes:
    """ container for Hypack mbes data """
    time = np.zeros(0, dtype='datetime64[us]')
    dn = np.zeros(0, dtype=np.int)
    st = np.zeros(0, dtype=np.int)
    sf = np.zeros(0, dtype=np.int)
    bd = np.zeros(0, dtype=np.int)
    n = np.zeros(0, dtype=np.int)
    sv = np.zeros(0, dtype=np.float)
    pn = np.zeros(0, dtype=np.int)
    pings = list()

class HsxData:
    """ container for Hypack HSX data """
    info = HsxInfo()
    att = HsxAttitude()
    mb = HsxMbes()
 

Content Exploration

 
Following an approach similar to the one explained in this post, during the file traversal, we collect information about the avaialable TopBlocks. These functions are used to list the collected information:
In [3]:
def print_info(datagram_nr, datagram_dict):
    """ Provide some info about the data file """
    print("\n> data file content:")
    print("  . total blocks:\t", datagram_nr)
    print("  . blocks list:")
    print("  [type] --> [counter]")
    for d in reversed(sorted(datagram_dict, key=datagram_dict.get)):
        print("   #%s --> %s" % (hex(d), datagram_dict[d]))
    print("")

def plot_info(datagram_dict):
    """ Plot the blocks list as bar chart ordered from the most common """
    keys, values = zip(*sorted(datagram_dict.items(), key=operator.itemgetter(1), reverse=True))
    fig, ax0 = plt.subplots(ncols=1, nrows=1, figsize=(12, 8), dpi=100)
    matplotlib.rcParams.update({'font.size': 9, 'text.usetex': True, 'font.family': 'serif'})
    bar = ax0.bar(range(len(values)), values, align='center', color='#1e90ff')
    # attach some text labels
    for rect in bar:
        height = rect.get_height()
        ax0.text(rect.get_x()+rect.get_width()/2., height + 100, '%d'%int(height), ha='center', va='bottom', color='#a0522d')
    ax0.set_title('$TopBlocks$  $List$')
    ax0.tick_params(axis='y', colors='#a0522d')
    ax0.set_ylabel('$count$', color='#a0522d')
    ax0.set_xlabel('$block$ $id[hex]$', color='#1e90ff')
    ax0.set_xticks(range(len(keys)))
    ax0.set_xticklabels([hex(lab) for lab in keys], color='#1e90ff')
    ax0.set_ylim((0, 1.1*max(values)))
    fig.autofmt_xdate()
    plt.show()
    fig.savefig('hsx_datagrams', dpi=600)
 

Metadata

 
The HSX format has several TopBlocks that store useful metadata. As example, we write some basic functions to access (and then print) this information:
In [4]:
def read_tnd(data_header, data_content, hsx_data):
    """ read and store a Hypack TND topblock """
    date_str = data_content.TND.d
    hsx_data.info.date = datetime.datetime.strptime(date_str, "%m/%d/%Y")
    datetime_str = data_content.TND.t + data_content.TND.d
    hsx_data.info.dt = datetime.datetime.strptime(datetime_str, "%H:%M:%S%m/%d/%Y")

def print_tnd(hsx_data):
    """ print the Hypack TND topblock """
    print("> TND:", hsx_data.info.dt)

def read_inf(data_header, data_content, hsx_data):
    """ read and store a Hypack INF topblock """
    hsx_data.info.surveyor = data_content.INF.surveyor
    hsx_data.info.boat = data_content.INF.boat
    hsx_data.info.project = data_content.INF.project
    hsx_data.info.area = data_content.INF.area

def print_inf(hsx_data):
    """ print the Hypack INF topblock """
    print("> INF:")
    print("  . surveyor:", hsx_data.info.surveyor)
    print("  . boat:    ", hsx_data.info.boat)
    print("  . project: ", hsx_data.info.project)
    print("  . area:    ", hsx_data.info.area)
 

Roll, Pith, and Heave

 
In the following code snippets, methods to collect, print, and display attitude data are provided:
In [5]:
def read_hcp(data_header, data_content, hsx_data):
    """ read and store a Hypack HCP topblock """
    time_mn = hsx_data.info.date + datetime.timedelta(seconds=float(data_content.HCP.t))
    hsx_data.att.time = np.append(hsx_data.att.time, time_mn)
    hsx_data.att.roll = np.append(hsx_data.att.roll, float(data_content.HCP.r))
    hsx_data.att.pitch = np.append(hsx_data.att.pitch, float(data_content.HCP.p))
    hsx_data.att.heave = np.append(hsx_data.att.heave, float(data_content.HCP.h))

def print_attitude(hsx_data):
    """ provide some info about read Hypack attitude data """
    df = pd.DataFrame({'Roll': hsx_data.att.roll,
                       'Pitch': hsx_data.att.pitch,
                       'Heave': hsx_data.att.heave}, index=hsx_data.att.time)
    print(df.head())
    print(df.index)
    print('\n', df.describe())

def plot_attitude(hsx_data):
    """ plot and save the read Hypack attitude data """
    fig, (ax0, ax1, ax2) = plt.subplots(ncols=1, nrows=3, figsize=(8, 12), dpi=150)
    xfmt = matplotlib.dates.DateFormatter('%H:%M:%S')
    matplotlib.rcParams.update({'font.size': 6, 'text.usetex': True, 'font.family': 'serif'})

    ax0.xaxis.set_major_formatter(xfmt)
    plot0 = ax0.plot(hsx_data.att.time, hsx_data.att.roll, linewidth=0.5, color='green')
    mean_value = np.mean(hsx_data.att.roll)
    ax0.set_ylim([mean_value - .8, mean_value + .8])
    ax0.grid(color='gray', alpha=0.5, linewidth=0.5)
    ax0.set_title('$Roll$')
    ax0.title.set_color('green')
    ax0.set_ylabel('$value [deg]$')
    ax0.yaxis.set_label_coords(-0.06, 0.5)

    ax1.xaxis.set_major_formatter(xfmt)
    plot1 = ax1.plot(hsx_data.att.time, hsx_data.att.pitch, linewidth=0.5, color='orange')
    mean_value = np.mean(hsx_data.att.pitch)
    ax1.set_ylim([mean_value - .8, mean_value + .8])
    ax1.grid(color='gray', alpha=0.5, linewidth=0.5)
    ax1.set_title('$Pitch$')
    ax1.title.set_color('orange')
    ax1.set_ylabel('$value [deg]$')
    ax1.yaxis.set_label_coords(-0.06, 0.5)

    ax2.xaxis.set_major_formatter(xfmt)
    plot2 = ax2.plot(hsx_data.att.time, hsx_data.att.heave, linewidth=0.5, color='red')
    mean_value = np.mean(hsx_data.att.heave)
    ax2.set_ylim([mean_value - .2, mean_value + .2])
    ax2.grid(color='gray', alpha=0.5, linewidth=0.5)
    ax2.set_title('$Heave$')
    ax2.title.set_color('red')
    ax2.set_ylabel('$value [m]$')
    ax2.yaxis.set_label_coords(-0.06, 0.5)

    plt.tight_layout()
    plt.show()
    fig.savefig('hsx_attitude', dpi=600)
 

Bathymetric Data

 
The bathymetric data are stored within the RMB (Raw Range Multibeam) TopBlocks. Since the slant range, the intensity and the quality information are stored as nested vectors in such a TopBlock, we create a helper function read_beams that is called from the parent method read_rmb to read this information for all the beams in each ping.
In [6]:
def read_beams(data_header, data_content, hsx_data):
    """ read and store a Hypack RMB beams data """
    new_ping = HsxMbPing()
    slr_str = hsx.hsx_rmb_content_array_getitem(data_content.RMB.content, 0).data
    new_ping.slr = np.array(list(map(float, slr_str.split())))
    lvl_str = hsx.hsx_rmb_content_array_getitem(data_content.RMB.content, 1).data
    new_ping.lvl = np.array(list(map(int, lvl_str.split())))
    qlt_str = hsx.hsx_rmb_content_array_getitem(data_content.RMB.content, 2).data
    new_ping.qlt = np.array(list(map(int, qlt_str.split())))
    hsx_data.mb.pings.append(new_ping)

def read_rmb(data_header, data_content, hsx_data):
    """ read and store a Hypack RMB topblock """
    time_mn = hsx_data.info.date + datetime.timedelta(seconds=float(data_content.RMB.t))
    hsx_data.mb.time = np.append(hsx_data.mb.time, time_mn)
    hsx_data.mb.dn = np.append(hsx_data.mb.dn, int(data_content.RMB.dn))
    hsx_data.mb.st = np.append(hsx_data.mb.st, int(data_content.RMB.st))
    hsx_data.mb.sf = np.append(hsx_data.mb.sf, "{0:#0{1}x}".format(int(data_content.RMB.sf, 16), 6))
    hsx_data.mb.bd = np.append(hsx_data.mb.bd, "{0:#0{1}x}".format(int(data_content.RMB.bd, 16), 6))
    hsx_data.mb.n = np.append(hsx_data.mb.n, int(data_content.RMB.n))
    hsx_data.mb.sv = np.append(hsx_data.mb.sv, float(data_content.RMB.sv))
    hsx_data.mb.pn = np.append(hsx_data.mb.pn, int(data_content.RMB.pn))
    read_beams(data_header, data_content, hsx_data)

def print_rmb(hsx_data):
    """ print info about the read Hypack RMB data """
    print("> Raw Multi Beam topblocks info:")

    df = pd.DataFrame({'Dev#': hsx_data.mb.dn,
                       'Flags': hsx_data.mb.sf,
                       'Beam Data': hsx_data.mb.bd,
                       'Beams': hsx_data.mb.n,
                       'SSP': hsx_data.mb.sv,
                       'Ping#': hsx_data.mb.pn}, index=hsx_data.mb.time)
    print(df.head())
    #print('\n', df.describe())

def print_rmb_ping(hsx_data, ping_nr):
    """ print info about the read Hypack RMB ping data """
    if (ping_nr < 0) or (ping_nr >= len(hsx_data.mb.time)):
        raise IOError("Invalid ping required")
    print("> printing #%d of %d pings" % (ping_nr, len(hsx_data.mb.time)))
    ping = hsx_data.mb.pings[ping_nr]
    print("Beam\tSlRng\tLevel\tQuality")
    for i in range(hsx_data.mb.n[ping_nr]):
        if i < 10:
            print("#%3d\t%.3f\t%5d\t%3d" % (i, ping.slr[i], ping.lvl[i], ping.qlt[i]))

def plot_rmb_ping(hsx_data, ping_nr):
    """ plot and save the read Hypack ping data """
    if (ping_nr < 0) or (ping_nr >= len(hsx_data.mb.time)):
        raise IOError("Invalid ping required")
    ping = hsx_data.mb.pings[ping_nr]
    beams = range(hsx_data.mb.n[ping_nr])

    fig, (ax0, ax1, ax2) = plt.subplots(ncols=1, nrows=3, figsize=(12, 6), dpi=100)
    matplotlib.rcParams.update({'font.size': 6, 'text.usetex': True, 'font.family': 'serif'})

    plot0 = ax0.plot(beams, ping.slr, linewidth=0.5, color='#1e90ff')
    ax0.set_ylim([0, 200])
    ax0.grid(color='gray', alpha=0.5, linewidth=1.1)
    ax0.set_title('$Slant$ $range$')
    ax0.title.set_color('#1e90ff')
    ax0.set_ylabel('$[work$ $units]$')
    ax0.yaxis.set_label_coords(-0.06, 0.5)
    ax0.invert_yaxis()

    plot1 = ax1.fill_between(beams, ping.lvl, linewidth=0.5, facecolor='#a0522d', alpha=0.8)
    ax1.set_ylim([0, 30000])
    ax1.grid(color='gray', alpha=0.5, linewidth=0.5)
    ax1.set_title('$Beam$ $Intensity$')
    ax1.title.set_color('#a0522d')
    ax1.set_ylabel('$[work$ $units]$')
    ax1.yaxis.set_label_coords(-0.06, 0.5)

    # make a color map of fixed colors
    plot2 = ax2.bar(beams, ping.qlt, 1, linewidth=0.5, color='green', alpha=0.8)
    ax2.set_ylim([0, 3.5])
    ax2.set_xlim([0, hsx_data.mb.n[ping_nr]-1])
    ax2.grid(color='gray', alpha=0.7, linewidth=0.5)
    ax2.set_title('$Beam$ $Quality$')
    ax2.title.set_color('green')
    ax2.set_ylabel('$Quality$ $Code$')
    ax2.yaxis.set_label_coords(-0.06, 0.5)

    #plt.tight_layout()
    plt.suptitle("Data Monitoring - Ping %d [%d beams]" % (ping_nr, hsx_data.mb.n[ping_nr]), fontsize=8)
    plt.show()
    fig.savefig('hsx_attitude', dpi=600)
 

Data access and visualization

 
The reading mechanism to access the data file content is quite simple, and it has been already presented here. We first open the raw file, then we check if the opening has been successful:
In [7]:
# Open the data file
testfile_path = "C:/_work/_sandy/oceano/ancillary/input/hypack/000_0853.HSX"
if not os.path.isfile(testfile_path):
    raise IOError("the file %s does not exist" % testfile_path)
file_size = os.path.getsize(testfile_path)
ip = hsx.fopen(testfile_path, "rb")
if ip is None:
    raise IOError("data file not successfully open")
print("> data file successfully open")
position = hsx.ftell(ip)
print("> initial position: ", position)
 
> data file successfully open
> initial position:  0
 
Some variables are declared here that will be populated in the following code snippets:
In [8]:
# counters, block dictionary and other information
datagram_dict = dict()
n_read = 0
datagram_nr = 0
data = hsx.hsx_hsxRev8_t()
hsx_data = HsxData()
 
We now use a loop that goes through all the content datagrams looking for datagram types and counting them using the previously declared dictionary. In addition, if the TopBlock read stores metadata information, bathymetry or attitude data, the previously introduced functions are called:
In [9]:
# Loop to read all the datagrams
abort = False
resync_event = 0
hsx.fseek(ip, 0, 0) # to go back at the beginning of data file
print("> looping through the whole data file: ", n_read)
while not abort:
    rc, n_read = hsx.read_hsx_hsxRev8(ip, data, n_read, hsx.validate_hsx_hdr)

    if rc == hsx.HUDDLER_TOPBLOCK_OK:
        if datagram_nr < 5: # to print only the first blocks
            print("  #", datagram_nr, "\t->\tbytes: ", n_read, "\ttype: ", data.header.type, "[", hex(data.header.type), "]")
        datagram_nr += 1
        datagram_dict[data.header.type] = datagram_dict.get(data.header.type, 0) + 1
        if data.header.type == hsx.HSX_HSXREV8_TND:
            read_tnd(data.header, data.datagram, hsx_data)
        elif data.header.type == hsx.HSX_HSXREV8_INF:
            read_inf(data.header, data.datagram, hsx_data)
        elif data.header.type == hsx.HSX_HSXREV8_HCP:
            read_hcp(data.header, data.datagram, hsx_data)
        elif data.header.type == hsx.HSX_HSXREV8_RMB:
            read_rmb(data.header, data.datagram, hsx_data)

    elif rc == hsx.HUDDLER_TOPBLOCK_UNKNOWN:
        print("> #", datagram_nr, " -> UNKNOWN datagram ", hex(data.header.type), " at offset ", position)
        datagram_nr += 1
        hsx.fseek(ip, position, 0)                 # Return to position before read
        hsx.fseek(ip, 1, 1)                        # Move on by one --- no length

    elif rc == hsx.HUDDLER_NO_HEADER:
        print("> #", datagram_nr, " -> no header at offset ", position, " -> resynch #", resync_event)
        resync_event += 1

    elif rc == hsx.HUDDLER_TOPBLOCK_READFAIL:
        print("> #", datagram_nr, " -> fail reading datagram at offset ", position, " -> resynch #", resync_event)
        resync_event += 1

    elif rc == hsx.HUDDLER_INVALID_TAIL:
        print("> #", datagram_nr, " -> invalid tail at offset ", position, " -> resynch #", resync_event)
        resync_event += 1

    elif rc == hsx.HUDDLER_EOF_WHILE:
        print("  . . .\n"
              "  #", datagram_nr, "->\tend of file at offset ", position)
        abort = True

    elif rc == hsx.HUDDLER_READ_ERROR:
        print("> #", datagram_nr, " -> unexpected read error at offset ", position, " -> aborting")
        abort = True

    else:
        print("> unknown read return: ", rc)
        abort = True

    hsx.clean_hsx_hsxRev8(data)
    position = hsx.ftell(ip)

# Closing the data source
print("> last position read in file: ", position, " [file size: ", file_size, "]")
if position == file_size:
    print("> the whole data file has been traversed")
else:
    print("> WARNING: the whole data file has NOT been traversed !!")
ret = hsx.fclose(ip)
if ret == 0:
    print("> data file closed")
 
> looping through the whole data file:  0
  # 0 	->	bytes:  10 	type:  542135366 [ 0x20505446 ]
  # 1 	->	bytes:  6 	type:  542659400 [ 0x20585348 ]
  # 2 	->	bytes:  24 	type:  541347412 [ 0x20444e54 ]
  # 3 	->	bytes:  56 	type:  541478473 [ 0x20464e49 ]
  # 4 	->	bytes:  47 	type:  542135112 [ 0x20505348 ]
  . . .
  # 24250 ->	end of file at offset  29000015
> last position read in file:  29000015  [file size:  29000015 ]
> the whole data file has been traversed
> data file closed
 
All the TopBlocks in the data file have been visited (we limited the visualization tojust the first 5). As a double check, we compare the file size with the number of read bytes, and they match. The helper functions print_info and plot_info are then called so that the user can inspect the file content:
In [10]:
%matplotlib inline
print_info(datagram_nr, datagram_dict)
plot_info(datagram_dict)
 

> data file content:
  . total blocks:	 24250
  . blocks list:
  [type] --> [counter]
   #0x20504348 --> 8752
   #0x20524e53 --> 3906
   #0x20424d52 --> 3906
   #0x20535352 --> 3905
   #0x20525947 --> 2663
   #0x20444954 --> 271
   #0x20535047 --> 271
   #0x20534f50 --> 271
   #0x20314345 --> 271
   #0x20435653 --> 10
   #0x20564544 --> 4
   #0x2032464f --> 4
   #0x20325644 --> 4
   #0x20584946 --> 2
   #0x20505348 --> 1
   #0x20585348 --> 1
   #0x20465648 --> 1
   #0x20495353 --> 1
   #0x20444e54 --> 1
   #0x20495250 --> 1
   #0x2049424d --> 1
   #0x20464e49 --> 1
   #0x20505446 --> 1
   #0xd484f45 --> 1
 
 
We now call the methods introduced for printing out some metadata:
In [11]:
print_tnd(hsx_data)
print_inf(hsx_data)
 
> TND: 2008-02-27 08:53:08
> INF:
  . surveyor: J. Doe
  . boat:     RV Quicky
  . project:  Full CVRG
  . area:     OpAREA
 
Now we evaluate the attitude data present in the survey line:
In [12]:
print_attitude(hsx_data)
plot_attitude(hsx_data)
 
                            Heave  Pitch  Roll
2008-02-27 08:53:08.284000  -0.08   0.65 -0.35
2008-02-27 08:53:08.312000  -0.08   0.65 -0.34
2008-02-27 08:53:08.346000  -0.08   0.64 -0.33
2008-02-27 08:53:08.375000  -0.08   0.64 -0.32
2008-02-27 08:53:08.409000  -0.08   0.65 -0.31
<class 'pandas.tseries.index.DatetimeIndex'>
[2008-02-27 08:53:08.284000, ..., 2008-02-27 08:57:39.006000]
Length: 8752, Freq: None, Timezone: None

              Heave        Pitch         Roll
count  8752.000000  8752.000000  8752.000000
mean     -0.000110     1.062823    -0.401004
std       0.012229     0.118978     0.076183
min      -0.080000     0.640000    -0.880000
25%       0.000000     0.990000    -0.430000
50%       0.000000     1.060000    -0.400000
75%       0.000000     1.150000    -0.360000
max       0.020000     1.360000    -0.160000
 
 

Exploring the Bathimetric Data

 
As the final step, we evaluate the bathymetric content. Each printed line shows:
  • the acquisition time,
  • the type of data present in the TopBlock (in such a case, slant range, beam intensity, and quality flag),
  • the number of beams,
  • the device number
  • a collection of flags (more info in the HSX Format Specification)
  • the ping number
  • the sound speed
We here visualize just the first five pings of the Pandas dataframe:
In [13]:
print_rmb(hsx_data)
 
> Raw Multi Beam topblocks info:
                           Beam Data  Beams  Dev#   Flags     Ping#   SSP
2008-02-27 08:53:08.113000    0x1801    101     1  0x0000  11820223  1499
2008-02-27 08:53:08.176000    0x1801    101     1  0x0000  11820224  1499
2008-02-27 08:53:08.246000    0x1801    101     1  0x0000  11820225  1499
2008-02-27 08:53:08.316000    0x1801    101     1  0x0000  11820226  1499
2008-02-27 08:53:08.387000    0x1801    101     1  0x0000  11820227  1499
 
Using the function print_rmb_ping we can pass the number of the ping (i.e., the 10th) that we want to analyze (we visualize just the first 10 beams):
In [14]:
print_rmb_ping(hsx_data, 10)
 
> printing #10 of 3906 pings
Beam	SlRng	Level	Quality
#  0	78.400	30720	  3
#  1	76.600	 1072	  3
#  2	74.900	  928	  3
#  3	75.100	 1088	  3
#  4	74.900	 1520	  3
#  5	72.700	 1424	  3
#  6	67.400	 1904	  3
#  7	64.200	 2048	  3
#  8	61.400	 2240	  3
#  9	59.100	 2192	  3
 
Having a bunch of numbers is usually quite difficult to interpret, so we plot the information for the same ping:
In [15]:
plot_rmb_ping(hsx_data, 10)
 
 
This last plot is quite interesting, but it would be nice to dynamically interact with the plot moving between different pings (e.g., using a widget) or to animate it (e.g., GIF). Since this post is growing too much, these ideas will be developed in future posts!
In [16]:
print("Done")
 
Done

 

tags: notebook; ascii; huddl