Data Analysis

Let’s take a look at what the data from the flight computer looks like. This data was recorded by an Arduino, logging telemetry once per second from the GPS and various sensors sensors and recorded them once a second. More details and the source code of the flight computer can be located at https://github.com/sea7aero/horizon2.

from preamble import *
%matplotlib inline
plt.rcParams['figure.figsize'] = (16, 5)
plt.rcParams['figure.dpi']= 150

directory = '../data'

# The flight tracker records invalid values as "*", so we want to replace those with Not a Number (NaN).
data = pd.read_csv(directory + '/raw-data.csv', na_values="*")
data.head()
millis year month day hour minute second voltage satellites hdop gps_altitude_m course speed_mps latitude longitude temperature_C pressure_Pa pressure_altitude_m humidity_RH Unnamed: 19
0 1544 NaN NaN NaN NaN NaN NaN 4.40 NaN NaN NaN NaN NaN NaN NaN 19.31 93891.82 638.10 42.74 NaN
1 2546 2000.0 0.0 0.0 0.0 0.0 0.0 4.37 0.0 99.99 NaN NaN NaN NaN NaN 19.36 93894.09 637.90 43.11 NaN
2 3547 2000.0 0.0 0.0 0.0 0.0 0.0 4.40 0.0 99.99 NaN NaN NaN NaN NaN 19.38 93892.00 638.09 42.69 NaN
3 4550 2000.0 0.0 0.0 0.0 0.0 0.0 4.40 0.0 99.99 NaN NaN NaN NaN NaN 19.39 93899.64 637.41 42.37 NaN
4 5551 2000.0 0.0 0.0 0.0 0.0 0.0 4.40 0.0 99.99 NaN NaN NaN NaN NaN 19.41 93896.39 637.70 42.41 NaN

Clean up the data

The raw data is a bit noisy - especially the first dozen seconds or so before the GPS acquired its lock - so the first step is to clean up the invalid data and a few other transformations in order to make it easier to work with.

# The tracker stores each part of the timestamp in a different column. To make it easier for us to work with and
# graph, we convert those columns into a single timestamp column, and make it the index of the Pandas dataframe.
datetime_cols = ["year", "month", "day", "hour", "minute", "second"]
timestamps = data[datetime_cols].apply(lambda x: "{:.0f}-{:.0f}-{:.0f} {:.0f}:{:.0f}:{:.0f}".format(*x), axis=1)
data["timestamp"] = pd.to_datetime(timestamps, errors="coerce")
data.drop(datetime_cols, axis=1)

# The first few rows don't have a valid timestamp at all; we'll just get rid of them.
data = data[data["timestamp"].notnull()]
data.index = data["timestamp"]
data = data.drop(["timestamp"], axis=1)

# # Fill any missing data with the last known value.  Then backfill missing data at the beginning of the file.
data = data.ffill().bfill()

# # Removes the extraneous "unnamed" column.
data = data.dropna(how="all", axis=1)

data.head()
millis year month day hour minute second voltage satellites hdop gps_altitude_m course speed_mps latitude longitude temperature_C pressure_Pa pressure_altitude_m humidity_RH
timestamp
2021-09-25 17:26:17 21594 2021.0 9.0 25.0 17.0 26.0 17.0 4.40 0.0 99.99 656.1 0.0 0.1 47.160809 -120.84832 19.56 93893.46 637.96 42.33
2021-09-25 17:26:18 22598 2021.0 9.0 25.0 17.0 26.0 18.0 4.40 0.0 99.99 656.1 0.0 0.1 47.160809 -120.84832 19.56 93896.89 637.65 42.08
2021-09-25 17:26:19 23600 2021.0 9.0 25.0 17.0 26.0 19.0 4.37 0.0 99.99 656.1 0.0 0.1 47.160809 -120.84832 19.58 93898.86 637.48 41.96
2021-09-25 17:26:20 24604 2021.0 9.0 25.0 17.0 26.0 20.0 4.40 0.0 99.99 656.1 0.0 0.1 47.160809 -120.84832 19.59 93894.25 637.89 42.06
2021-09-25 17:26:21 25607 2021.0 9.0 25.0 17.0 26.0 21.0 4.40 0.0 99.99 656.1 0.0 0.1 47.160809 -120.84832 19.59 93891.08 638.17 42.25

That’s much better, but we turned the tracker on before launch, and that data is kind of boring, so let us compute some interesting time spans and crop our data to just the mission data.

# Figure out the launch time by when it starts ascending.
ascent_rate = data['gps_altitude_m'].diff()
launch_index = np.argmax(ascent_rate > 3)
launch_millis = data['millis'][launch_index]
launch_time = data.index[launch_index]
data['mission_millis'] = data['millis'] - launch_millis

burst_index = np.argmax(data['gps_altitude_m'])
burst_time = data.index[burst_index]

# Figure out the landing time by when it stopped descending.
landing_index = len(ascent_rate) - np.argmax(ascent_rate.iloc[::-1] < -3)
landing_millis = data['millis'][launch_index]
landing_time = data.index[landing_index] # TODO: Determine this value, it may not be the last data.

print("Launch Time (UTC) : {}".format(launch_time))
print("Burst Time        : {}".format(burst_time))
print("Landing Time      : {}".format(landing_time))

ascent_duration = burst_time - launch_time
descent_duration = landing_time - burst_time
mission_duration = ascent_duration + descent_duration

print("Mission Duration  : {}".format(mission_duration))
print("Ascent Duration   : {}".format(ascent_duration))
print("Descent Duration  : {}".format(descent_duration))
Launch Time (UTC) : 2021-09-25 17:28:11
Burst Time        : 2021-09-25 20:04:25
Landing Time      : 2021-09-25 20:41:37
Mission Duration  : 0 days 03:13:26
Ascent Duration   : 0 days 02:36:14
Descent Duration  : 0 days 00:37:12
mission_data = data[launch_index:landing_index].copy()
mission_data.tail()
millis year month day hour minute second voltage satellites hdop gps_altitude_m course speed_mps latitude longitude temperature_C pressure_Pa pressure_altitude_m humidity_RH mission_millis
timestamp
2021-09-25 20:41:33 11728210 2021.0 9.0 25.0 20.0 41.0 33.0 4.37 12.0 0.73 364.4 34.4 4.8 47.188889 -119.438377 23.58 96866.77 377.97 31.80 11593195
2021-09-25 20:41:34 11729211 2021.0 9.0 25.0 20.0 41.0 34.0 4.37 12.0 0.79 358.9 51.6 3.2 47.188900 -119.438339 23.63 96920.57 373.33 31.76 11594196
2021-09-25 20:41:35 11730215 2021.0 9.0 25.0 20.0 41.0 35.0 4.37 12.0 0.79 353.4 76.4 2.6 47.188892 -119.438301 23.68 96992.07 367.16 31.70 11595200
2021-09-25 20:41:36 11731216 2021.0 9.0 25.0 20.0 41.0 36.0 4.37 12.0 0.90 348.0 97.6 3.0 47.188877 -119.438271 23.73 97054.25 361.80 31.62 11596201
2021-09-25 20:41:37 11732219 2021.0 9.0 25.0 20.0 41.0 37.0 4.37 12.0 0.75 342.9 112.1 3.5 47.188862 -119.438232 23.77 97115.67 356.50 31.64 11597204

Camera Shutoff

In Graupel-1, we had a failure of the GoPro camera at around the Tropopause. We determined this was due to the 16850 battery we were using to charge the GoPro getting cold, reducing its output voltage, and causing a shut off.

For Graupel-2, we used a large, commercial external battery bank to power both the GoPro and the flight computer. We insulated the battery with some foam and added a hand warmer for good measure.

It looks like it worked well to keep the battery above critical voltages:

ax = pretty_plots()
ax.plot(mission_data['temperature_C'])
ax2 = ax.twinx()
ax2.plot(mission_data['voltage'].ewm(span=30).mean(), color='C1')
ax.set_title("Effect of temperature on voltage")
ax.set_ylabel("Temperature (C)")
ax2.set_ylabel("Voltage (V)")

print(f"Minimum temperature {mission_data['temperature_C'].min()}C")
print(f"Minimum voltage {mission_data['voltage'].min()}V")
Minimum temperature -53.3C
Minimum voltage 4.13V
_images/data-analysis_8_1.png

Altitude

The next thing to look at is the altitude, of course. In our case, we have 2 sources of data for the altitude: the value reported by the GPS and a value derived from the barometric pressure sensor. Let’s see what we’ve got…

ax = pretty_plots()
ax.plot(mission_data[["gps_altitude_m", "pressure_altitude_m"]])
ax.set_ylabel("altitude (meters)")
ax.legend(["gps", "barometric"])
ax.set_title('Raw altitude data')

difference = (mission_data["gps_altitude_m"] - mission_data["pressure_altitude_m"]).max()
print(f"Maximum altitude discrepency: {difference:.0f} meters")
Maximum altitude discrepency: 7005 meters
_images/data-analysis_10_1.png

Correcting the barometric altitude

From Graupel-1, we learned that the tracker computes altitude from the barometric pressue using a naive algorithm that assumes you are in the Troposphere, which is the cause of the variation betweent he two readings. Let’s go ahead and apply what we learned there to this data.

# Here, we're loading up the ISA atmopsheric conditions and defining some constants.
isa = pd.read_csv(directory + '/standard-atmosphere.csv', index_col="layer").sort_values('base_pressure_Pa')

def annotate_layers(ax, label=True, max_layer=10):
    for index, layer in isa.iterrows():
        if(index < max_layer):
            altitude=layer['geopotential_altitude_m']
            ax.axhline(altitude, color='black', ls='-.', lw=0.25)
            if(label):
                ax.text(0.01, altitude+400, layer['name'], transform=ax.get_yaxis_transform())
        
display(isa)
name geopotential_altitude_m geometric_altitude_m lapse_rate_K_m base_temperature_C base_temperature_K base_pressure_Pa base_density_kg_m3
layer
7 Mesopause 84852 86000 NaN -86.28 186.87 0.37 NaN
6 Mesosphere 71000 71802 -0.0020 -58.50 214.65 3.96 NaN
5 Mesosphere 51000 51413 -0.0028 -2.50 270.65 66.94 NaN
4 Stratopause 47000 47350 0.0000 -2.50 270.65 110.91 0.0020
3 Stratosphere 32000 32162 0.0028 -44.50 228.65 868.02 0.0132
2 Stratosphere 20000 20063 0.0010 -56.50 216.65 5474.90 0.0880
1 Tropopause 11000 11019 0.0000 -56.50 216.65 22632.00 0.3639
0 Troposphere -610 -611 -0.0065 19.00 292.15 108900.00 1.2985
universal_gas_constant = 8.31432 # R, N*m/mol*K
molar_mass_dry_air = 0.0289644 # M, kg/mol
gas_constant_dry_air = universal_gas_constant / molar_mass_dry_air
gravitational_acceleration = 9.80665 # g0, m/s^2
# R / g0 * M
altitude_constants = -1.0 * gas_constant_dry_air / gravitational_acceleration

def calc_altitude(pressure):
    layer = isa['base_pressure_Pa'].searchsorted(pressure)
    parameters = isa.iloc[layer]
    base_altitude = parameters['geopotential_altitude_m']
    base_temperature = parameters['base_temperature_K']
    pressure_ratio = pressure / parameters['base_pressure_Pa']
    lapse_rate = parameters['lapse_rate_K_m']
    
    if lapse_rate == 0:
        factor = altitude_constants * base_temperature 
        return base_altitude + factor * math.log(pressure_ratio)
    else:
        factor = base_temperature / lapse_rate
        exponent = altitude_constants * lapse_rate
        return base_altitude + factor * (pow(pressure_ratio, exponent)-1)
    
calc_altitude = np.vectorize(calc_altitude)
# From: https://commons.erau.edu/cgi/viewcontent.cgi?article=1124&context=ijaaa
def calc_altitude_temp_adjusted(pressure_Pa, temperature_C):
    temperature_K = temperature_C + 273.15
    layer = isa['base_pressure_Pa'].searchsorted(pressure_Pa)
    parameters = isa.iloc[layer]
    
    base_altitude_m = parameters['geopotential_altitude_m']
    base_temperature_K = parameters['base_temperature_K']
    temp_ratio = base_temperature_K / temperature_K
    
    pressure_ratio = pressure_Pa / parameters['base_pressure_Pa']
    
    density_ratio = pressure_ratio * temp_ratio
    lapse_rate = parameters['lapse_rate_K_m']
    
    if lapse_rate == 0:
        factor = altitude_constants * base_temperature_K
        return base_altitude_m + factor * math.log(density_ratio)
    else:
        factor = base_temperature_K / lapse_rate
        exponent = altitude_constants * lapse_rate
        return base_altitude_m + factor * (pow(density_ratio, exponent)-1)
    
calc_altitude_temp_adjusted = np.vectorize(calc_altitude_temp_adjusted)
def calc_altitude_blended(pressure_Pa, temperature_C):
    if(pressure_Pa > 22632):
        return calc_altitude(pressure_Pa)
    else:
        return calc_altitude_temp_adjusted(pressure_Pa, temperature_C)
    
calc_altitude_blended = np.vectorize(calc_altitude_blended)
mission_data['blended_altitude_m'] = calc_altitude_blended(mission_data['pressure_Pa'], mission_data['temperature_C'])
ax = pretty_plots()
ax.plot(mission_data[['gps_altitude_m', 'blended_altitude_m']])
ax.set_ylabel("altitude (m)")
ax.legend(["gps", "better barometric"]);
ax.set_title("Temperature adjusted altitude")

difference = mission_data["gps_altitude_m"] - mission_data["blended_altitude_m"]
print(f"Maximum altitude discrepency: {difference.max():.0f} meters")
Maximum altitude discrepency: 1046 meters
_images/data-analysis_16_1.png

GPS altitude noise

Okay, now the barometric altitude looks better, but what about those weird jags in the GPS data? Those were not there in the Graupel-1 launch. Let’s take a closer look.

import datetime
window = datetime.timedelta(minutes=30)

def plot_window(title, start, end):
    fig, ax = plt.subplots(tight_layout=True)
    
    locator = mdates.MinuteLocator(byminute=range(60), interval=4)
    formatter = mdates.ConciseDateFormatter(locator)
    ax.xaxis.set_major_locator(locator)
    ax.xaxis.set_major_formatter(formatter)
    
    ax.plot(data[start:end]['gps_altitude_m'])
    ax.set_title(title)
        
plot_window('At launch', launch_time, launch_time + window)
plot_window('Before burst', burst_time - window, burst_time)
plot_window('After burst', burst_time, burst_time + window)
plot_window('At landing', landing_time - window, landing_time)
_images/data-analysis_18_0.png _images/data-analysis_18_1.png _images/data-analysis_18_2.png _images/data-analysis_18_3.png

Smoothing noise

The noise in the GPS altitude is still a mystery to me. It seems that roughly every 4 minutes, through the entire flight, the GPS fails to update its altitude. It does not appear to be correlated to any other data as far as I can tell. If anyone has an idea, I’d love to hear it.

However, we can correct the data fairly well.

The first thing we can do is simply discard the obvious outliers where the GPS data was dropped using a “hampel” filter.

from hampel import hampel

filtered_altitude = hampel(mission_data['gps_altitude_m'], window_size=10, n=3, imputation=True)
ax = pretty_plots()

ax.plot(mission_data['gps_altitude_m'], label="unfiltered")
ax.plot(filtered_altitude, label="filtered")
ax.legend();
_images/data-analysis_20_0.png

Next we can detect the discrepencies pretty easily by noting that the difference between one data point and the next is exactly 0. This happens rather rarely in real data, so we can simply throw those data points out and interpolate to fill the gaps.

# Replaces data points where the differences is 0 with NaN.
reject_noise = filtered_altitude.where(filtered_altitude.diff() != 0)

# Fill those NaNs through linear interpolation.
mission_data['altitude_m'] = reject_noise.interpolate()


ax = pretty_plots()
ax.set_title('Smoothed altitude')
ax.plot(mission_data['altitude_m'], label='altitude')
ax.set_ylabel('altitude (m)')

ax2 = ax.twinx()
ax2.plot(mission_data['altitude_m'].diff(), color='C1');
ax2.set_ylabel('ascent rate (m/s)')
Text(0, 0.5, 'ascent rate (m/s)')
_images/data-analysis_22_1.png

Alright, that looks pretty reasonable, though with some fairly rough nosie during a portion of the descent phase.

Altitude statistics

Finally, we can now compute some interesting altitude statistics and use them to answer some questions about our mission.

burst_altitude_index = np.argmax(mission_data['altitude_m'])
burst_altitude_m = mission_data['altitude_m'].max() * units.m
burst_altitude_ft = burst_altitude_m.to(units.ft)

print(f"Burst altitude: {burst_altitude_m.magnitude:0.1f} meters, ({burst_altitude_ft.magnitude:0.1f} feet)")
Burst altitude: 33502.8 meters, (109917.3 feet)
# We can smooth out a bit of the noise in the ascent rate data with a simple exponential moving average.
ascent_rate = mission_data['altitude_m'].diff()
mission_data['ascent_rate_mps'] = ascent_rate.ewm(span=30).mean()

ax = pretty_plots()
ax.plot(mission_data['ascent_rate_mps'])

ascent_phase = mission_data[mission_data.index < burst_time].copy()
descent_phase = mission_data[mission_data.index > burst_time].copy()

print("Maximum ascent rate: {0:0.2f} m/s".format(ascent_phase['ascent_rate_mps'].max()))
print("Average ascent rate: {0:0.2f} m/s".format(ascent_phase['ascent_rate_mps'].mean()))
print()
print("Maximum descent rate: {0:0.2f} m/s".format(descent_phase['ascent_rate_mps'].min()*-1))
print("Average descent rate: {0:0.2f} m/s".format(descent_phase['ascent_rate_mps'].mean()*-1))
print("Ascent rate at landing: {0:0.2f} m/s".format(descent_phase['ascent_rate_mps'][-1]*-1))
Maximum ascent rate: 6.36 m/s
Average ascent rate: 3.53 m/s

Maximum descent rate: 60.84 m/s
Average descent rate: 14.91 m/s
Ascent rate at landing: 4.87 m/s
_images/data-analysis_26_1.png

Other Fun Data

Horizontal speed

In Graupel-1, our balloon reached horizontal speeds of up to 90 mph. This time, our balloon strolled along a more leisurely pace. This is good, because our ascent rate was lower than anticipated… if it was any faster, we would have had to drive to Idaho.

mission_data['speed_mps'] = mission_data['speed_mps'].ewm(alpha=0.05).mean()

fig, ax = plt.subplots()
annotate_layers(ax, label=True, max_layer=4)

ax.plot(mission_data['speed_mps'], mission_data['altitude_m'])
#ax.plot(descent_phase['speed_mps'], descent_phase['altitude_m'])

ax.set_title("Speed at altitude")
ax.set_xlabel("speed (m/s)")
ax.set_ylabel("altitude (meters)")

mph = units.mile / units.hour
max_speed = mission_data['speed_mps'].max() * units.m / units.s
print(f"Max speed: {max_speed.to(mph):0.2f}")
Max speed: 97.55 mile / hour
_images/data-analysis_29_1.png

From the above data, we can see why our balloon travelled so far east from its predicted flight path. Our original estimated ascent rate was 4.9 m/s, but since we ran out of helium before hitting our target lift, our actual ascent rate averaged only 3.9 m/s. This caused our balloon to drift through the jet stream for a longer period of time, where it was being blown around at a brisk 35 m/s (~80 mph)!

Weather Data

fig, ax = plt.subplots()

annotate_layers(ax, label=True, max_layer=4)
ax.plot(ascent_phase['temperature_C'], ascent_phase['altitude_m'], label="ascent")
ax.plot(descent_phase['temperature_C'], descent_phase['altitude_m'], label="descent");
ax.set_title("Temperature at Altitude")
ax.set_ylabel("altitude (m)")
ax.set_xlabel("temperature (C)")
Text(0.5, 0, 'temperature (C)')
_images/data-analysis_32_1.png
# Write the missison data back out to a csv file.
mission_data.to_csv(directory + '/clean-data.csv')