Data Analysis¶

Let’s take a look at what the data from the flight computer looks like. This data was recorded by an Arduino, logging telemetry once per second from the GPS and various sensors sensors and recorded them once a second. More details and the source code of the flight computer can be located at https://github.com/sea7aero/horizon2.

from preamble import *
%matplotlib inline
plt.rcParams['figure.figsize'] = (16, 5)
plt.rcParams['figure.dpi']= 150

directory = '../data'

# The flight tracker records invalid values as "*", so we want to replace those with Not a Number (NaN).
data = pd.read_csv(directory + '/raw-data.csv', na_values="*")
data.head()

	millis	year	month	day	hour	minute	second	voltage	satellites	hdop	gps_altitude_m	course	speed_mps	latitude	longitude	temperature_C	pressure_Pa	pressure_altitude_m	humidity_RH	Unnamed: 19
0	1544	NaN	NaN	NaN	NaN	NaN	NaN	4.40	NaN	NaN	NaN	NaN	NaN	NaN	NaN	19.31	93891.82	638.10	42.74	NaN
1	2546	2000.0	0.0	0.0	0.0	0.0	0.0	4.37	0.0	99.99	NaN	NaN	NaN	NaN	NaN	19.36	93894.09	637.90	43.11	NaN
2	3547	2000.0	0.0	0.0	0.0	0.0	0.0	4.40	0.0	99.99	NaN	NaN	NaN	NaN	NaN	19.38	93892.00	638.09	42.69	NaN
3	4550	2000.0	0.0	0.0	0.0	0.0	0.0	4.40	0.0	99.99	NaN	NaN	NaN	NaN	NaN	19.39	93899.64	637.41	42.37	NaN
4	5551	2000.0	0.0	0.0	0.0	0.0	0.0	4.40	0.0	99.99	NaN	NaN	NaN	NaN	NaN	19.41	93896.39	637.70	42.41	NaN

Clean up the data¶

The raw data is a bit noisy - especially the first dozen seconds or so before the GPS acquired its lock - so the first step is to clean up the invalid data and a few other transformations in order to make it easier to work with.

# The tracker stores each part of the timestamp in a different column. To make it easier for us to work with and
# graph, we convert those columns into a single timestamp column, and make it the index of the Pandas dataframe.
datetime_cols = ["year", "month", "day", "hour", "minute", "second"]
timestamps = data[datetime_cols].apply(lambda x: "{:.0f}-{:.0f}-{:.0f} {:.0f}:{:.0f}:{:.0f}".format(*x), axis=1)
data["timestamp"] = pd.to_datetime(timestamps, errors="coerce")
data.drop(datetime_cols, axis=1)

# The first few rows don't have a valid timestamp at all; we'll just get rid of them.
data = data[data["timestamp"].notnull()]
data.index = data["timestamp"]
data = data.drop(["timestamp"], axis=1)

# # Fill any missing data with the last known value.  Then backfill missing data at the beginning of the file.
data = data.ffill().bfill()

# # Removes the extraneous "unnamed" column.
data = data.dropna(how="all", axis=1)

data.head()

	millis	year	month	day	hour	minute	second	voltage	satellites	hdop	gps_altitude_m	course	speed_mps	latitude	longitude	temperature_C	pressure_Pa	pressure_altitude_m	humidity_RH
timestamp
2021-09-25 17:26:17	21594	2021.0	9.0	25.0	17.0	26.0	17.0	4.40	0.0	99.99	656.1	0.0	0.1	47.160809	-120.84832	19.56	93893.46	637.96	42.33
2021-09-25 17:26:18	22598	2021.0	9.0	25.0	17.0	26.0	18.0	4.40	0.0	99.99	656.1	0.0	0.1	47.160809	-120.84832	19.56	93896.89	637.65	42.08
2021-09-25 17:26:19	23600	2021.0	9.0	25.0	17.0	26.0	19.0	4.37	0.0	99.99	656.1	0.0	0.1	47.160809	-120.84832	19.58	93898.86	637.48	41.96
2021-09-25 17:26:20	24604	2021.0	9.0	25.0	17.0	26.0	20.0	4.40	0.0	99.99	656.1	0.0	0.1	47.160809	-120.84832	19.59	93894.25	637.89	42.06
2021-09-25 17:26:21	25607	2021.0	9.0	25.0	17.0	26.0	21.0	4.40	0.0	99.99	656.1	0.0	0.1	47.160809	-120.84832	19.59	93891.08	638.17	42.25

That’s much better, but we turned the tracker on before launch, and that data is kind of boring, so let us compute some interesting time spans and crop our data to just the mission data.

# Figure out the launch time by when it starts ascending.
ascent_rate = data['gps_altitude_m'].diff()
launch_index = np.argmax(ascent_rate > 3)
launch_millis = data['millis'][launch_index]
launch_time = data.index[launch_index]
data['mission_millis'] = data['millis'] - launch_millis

burst_index = np.argmax(data['gps_altitude_m'])
burst_time = data.index[burst_index]

# Figure out the landing time by when it stopped descending.
landing_index = len(ascent_rate) - np.argmax(ascent_rate.iloc[::-1] < -3)
landing_millis = data['millis'][launch_index]
landing_time = data.index[landing_index] # TODO: Determine this value, it may not be the last data.

print("Launch Time (UTC) : {}".format(launch_time))
print("Burst Time        : {}".format(burst_time))
print("Landing Time      : {}".format(landing_time))

ascent_duration = burst_time - launch_time
descent_duration = landing_time - burst_time
mission_duration = ascent_duration + descent_duration

print("Mission Duration  : {}".format(mission_duration))
print("Ascent Duration   : {}".format(ascent_duration))
print("Descent Duration  : {}".format(descent_duration))

Launch Time (UTC) : 2021-09-25 17:28:11
Burst Time        : 2021-09-25 20:04:25
Landing Time      : 2021-09-25 20:41:37
Mission Duration  : 0 days 03:13:26
Ascent Duration   : 0 days 02:36:14
Descent Duration  : 0 days 00:37:12

mission_data = data[launch_index:landing_index].copy()
mission_data.tail()

	millis	year	month	day	hour	minute	second	voltage	satellites	hdop	gps_altitude_m	course	speed_mps	latitude	longitude	temperature_C	pressure_Pa	pressure_altitude_m	humidity_RH	mission_millis
timestamp
2021-09-25 20:41:33	11728210	2021.0	9.0	25.0	20.0	41.0	33.0	4.37	12.0	0.73	364.4	34.4	4.8	47.188889	-119.438377	23.58	96866.77	377.97	31.80	11593195
2021-09-25 20:41:34	11729211	2021.0	9.0	25.0	20.0	41.0	34.0	4.37	12.0	0.79	358.9	51.6	3.2	47.188900	-119.438339	23.63	96920.57	373.33	31.76	11594196
2021-09-25 20:41:35	11730215	2021.0	9.0	25.0	20.0	41.0	35.0	4.37	12.0	0.79	353.4	76.4	2.6	47.188892	-119.438301	23.68	96992.07	367.16	31.70	11595200
2021-09-25 20:41:36	11731216	2021.0	9.0	25.0	20.0	41.0	36.0	4.37	12.0	0.90	348.0	97.6	3.0	47.188877	-119.438271	23.73	97054.25	361.80	31.62	11596201
2021-09-25 20:41:37	11732219	2021.0	9.0	25.0	20.0	41.0	37.0	4.37	12.0	0.75	342.9	112.1	3.5	47.188862	-119.438232	23.77	97115.67	356.50	31.64	11597204

Camera Shutoff¶

In Graupel-1, we had a failure of the GoPro camera at around the Tropopause. We determined this was due to the 16850 battery we were using to charge the GoPro getting cold, reducing its output voltage, and causing a shut off.

For Graupel-2, we used a large, commercial external battery bank to power both the GoPro and the flight computer. We insulated the battery with some foam and added a hand warmer for good measure.

It looks like it worked well to keep the battery above critical voltages:

ax = pretty_plots()
ax.plot(mission_data['temperature_C'])
ax2 = ax.twinx()
ax2.plot(mission_data['voltage'].ewm(span=30).mean(), color='C1')
ax.set_title("Effect of temperature on voltage")
ax.set_ylabel("Temperature (C)")
ax2.set_ylabel("Voltage (V)")

print(f"Minimum temperature {mission_data['temperature_C'].min()}C")
print(f"Minimum voltage {mission_data['voltage'].min()}V")

Minimum temperature -53.3C
Minimum voltage 4.13V

Altitude¶

The next thing to look at is the altitude, of course. In our case, we have 2 sources of data for the altitude: the value reported by the GPS and a value derived from the barometric pressure sensor. Let’s see what we’ve got…

ax = pretty_plots()
ax.plot(mission_data[["gps_altitude_m", "pressure_altitude_m"]])
ax.set_ylabel("altitude (meters)")
ax.legend(["gps", "barometric"])
ax.set_title('Raw altitude data')

difference = (mission_data["gps_altitude_m"] - mission_data["pressure_altitude_m"]).max()
print(f"Maximum altitude discrepency: {difference:.0f} meters")

Maximum altitude discrepency: 7005 meters

Correcting the barometric altitude¶

From Graupel-1, we learned that the tracker computes altitude from the barometric pressue using a naive algorithm that assumes you are in the Troposphere, which is the cause of the variation betweent he two readings. Let’s go ahead and apply what we learned there to this data.

# Here, we're loading up the ISA atmopsheric conditions and defining some constants.
isa = pd.read_csv(directory + '/standard-atmosphere.csv', index_col="layer").sort_values('base_pressure_Pa')

def annotate_layers(ax, label=True, max_layer=10):
    for index, layer in isa.iterrows():
        if(index < max_layer):
            altitude=layer['geopotential_altitude_m']
            ax.axhline(altitude, color='black', ls='-.', lw=0.25)
            if(label):
                ax.text(0.01, altitude+400, layer['name'], transform=ax.get_yaxis_transform())
        
display(isa)

	name	geopotential_altitude_m	geometric_altitude_m	lapse_rate_K_m	base_temperature_C	base_temperature_K	base_pressure_Pa	base_density_kg_m3
layer
7	Mesopause	84852	86000	NaN	-86.28	186.87	0.37	NaN
6	Mesosphere	71000	71802	-0.0020	-58.50	214.65	3.96	NaN
5	Mesosphere	51000	51413	-0.0028	-2.50	270.65	66.94	NaN
4	Stratopause	47000	47350	0.0000	-2.50	270.65	110.91	0.0020
3	Stratosphere	32000	32162	0.0028	-44.50	228.65	868.02	0.0132
2	Stratosphere	20000	20063	0.0010	-56.50	216.65	5474.90	0.0880
1	Tropopause	11000	11019	0.0000	-56.50	216.65	22632.00	0.3639
0	Troposphere	-610	-611	-0.0065	19.00	292.15	108900.00	1.2985

universal_gas_constant = 8.31432 # R, N*m/mol*K
molar_mass_dry_air = 0.0289644 # M, kg/mol
gas_constant_dry_air = universal_gas_constant / molar_mass_dry_air
gravitational_acceleration = 9.80665 # g0, m/s^2
# R / g0 * M
altitude_constants = -1.0 * gas_constant_dry_air / gravitational_acceleration

def calc_altitude(pressure):
    layer = isa['base_pressure_Pa'].searchsorted(pressure)
    parameters = isa.iloc[layer]
    base_altitude = parameters['geopotential_altitude_m']
    base_temperature = parameters['base_temperature_K']
    pressure_ratio = pressure / parameters['base_pressure_Pa']
    lapse_rate = parameters['lapse_rate_K_m']
    
    if lapse_rate == 0:
        factor = altitude_constants * base_temperature 
        return base_altitude + factor * math.log(pressure_ratio)
    else:
        factor = base_temperature / lapse_rate
        exponent = altitude_constants * lapse_rate
        return base_altitude + factor * (pow(pressure_ratio, exponent)-1)
    
calc_altitude = np.vectorize(calc_altitude)

# From: https://commons.erau.edu/cgi/viewcontent.cgi?article=1124&context=ijaaa
def calc_altitude_temp_adjusted(pressure_Pa, temperature_C):
    temperature_K = temperature_C + 273.15
    layer = isa['base_pressure_Pa'].searchsorted(pressure_Pa)
    parameters = isa.iloc[layer]
    
    base_altitude_m = parameters['geopotential_altitude_m']
    base_temperature_K = parameters['base_temperature_K']
    temp_ratio = base_temperature_K / temperature_K
    
    pressure_ratio = pressure_Pa / parameters['base_pressure_Pa']
    
    density_ratio = pressure_ratio * temp_ratio
    lapse_rate = parameters['lapse_rate_K_m']
    
    if lapse_rate == 0:
        factor = altitude_constants * base_temperature_K
        return base_altitude_m + factor * math.log(density_ratio)
    else:
        factor = base_temperature_K / lapse_rate
        exponent = altitude_constants * lapse_rate
        return base_altitude_m + factor * (pow(density_ratio, exponent)-1)
    
calc_altitude_temp_adjusted = np.vectorize(calc_altitude_temp_adjusted)

def calc_altitude_blended(pressure_Pa, temperature_C):
    if(pressure_Pa > 22632):
        return calc_altitude(pressure_Pa)
    else:
        return calc_altitude_temp_adjusted(pressure_Pa, temperature_C)
    
calc_altitude_blended = np.vectorize(calc_altitude_blended)

mission_data['blended_altitude_m'] = calc_altitude_blended(mission_data['pressure_Pa'], mission_data['temperature_C'])
ax = pretty_plots()
ax.plot(mission_data[['gps_altitude_m', 'blended_altitude_m']])
ax.set_ylabel("altitude (m)")
ax.legend(["gps", "better barometric"]);
ax.set_title("Temperature adjusted altitude")

difference = mission_data["gps_altitude_m"] - mission_data["blended_altitude_m"]
print(f"Maximum altitude discrepency: {difference.max():.0f} meters")

Maximum altitude discrepency: 1046 meters

GPS altitude noise¶

Okay, now the barometric altitude looks better, but what about those weird jags in the GPS data? Those were not there in the Graupel-1 launch. Let’s take a closer look.

import datetime
window = datetime.timedelta(minutes=30)

def plot_window(title, start, end):
    fig, ax = plt.subplots(tight_layout=True)
    
    locator = mdates.MinuteLocator(byminute=range(60), interval=4)
    formatter = mdates.ConciseDateFormatter(locator)
    ax.xaxis.set_major_locator(locator)
    ax.xaxis.set_major_formatter(formatter)
    
    ax.plot(data[start:end]['gps_altitude_m'])
    ax.set_title(title)
        
plot_window('At launch', launch_time, launch_time + window)
plot_window('Before burst', burst_time - window, burst_time)
plot_window('After burst', burst_time, burst_time + window)
plot_window('At landing', landing_time - window, landing_time)

Smoothing noise¶

The noise in the GPS altitude is still a mystery to me. It seems that roughly every 4 minutes, through the entire flight, the GPS fails to update its altitude. It does not appear to be correlated to any other data as far as I can tell. If anyone has an idea, I’d love to hear it.

However, we can correct the data fairly well.

The first thing we can do is simply discard the obvious outliers where the GPS data was dropped using a “hampel” filter.

from hampel import hampel

filtered_altitude = hampel(mission_data['gps_altitude_m'], window_size=10, n=3, imputation=True)
ax = pretty_plots()

ax.plot(mission_data['gps_altitude_m'], label="unfiltered")
ax.plot(filtered_altitude, label="filtered")
ax.legend();

Next we can detect the discrepencies pretty easily by noting that the difference between one data point and the next is exactly 0. This happens rather rarely in real data, so we can simply throw those data points out and interpolate to fill the gaps.

# Replaces data points where the differences is 0 with NaN.
reject_noise = filtered_altitude.where(filtered_altitude.diff() != 0)

# Fill those NaNs through linear interpolation.
mission_data['altitude_m'] = reject_noise.interpolate()


ax = pretty_plots()
ax.set_title('Smoothed altitude')
ax.plot(mission_data['altitude_m'], label='altitude')
ax.set_ylabel('altitude (m)')

ax2 = ax.twinx()
ax2.plot(mission_data['altitude_m'].diff(), color='C1');
ax2.set_ylabel('ascent rate (m/s)')

Text(0, 0.5, 'ascent rate (m/s)')

Alright, that looks pretty reasonable, though with some fairly rough nosie during a portion of the descent phase.

Altitude statistics¶

Finally, we can now compute some interesting altitude statistics and use them to answer some questions about our mission.

burst_altitude_index = np.argmax(mission_data['altitude_m'])
burst_altitude_m = mission_data['altitude_m'].max() * units.m
burst_altitude_ft = burst_altitude_m.to(units.ft)

print(f"Burst altitude: {burst_altitude_m.magnitude:0.1f} meters, ({burst_altitude_ft.magnitude:0.1f} feet)")

Burst altitude: 33502.8 meters, (109917.3 feet)

# We can smooth out a bit of the noise in the ascent rate data with a simple exponential moving average.
ascent_rate = mission_data['altitude_m'].diff()
mission_data['ascent_rate_mps'] = ascent_rate.ewm(span=30).mean()

ax = pretty_plots()
ax.plot(mission_data['ascent_rate_mps'])

ascent_phase = mission_data[mission_data.index < burst_time].copy()
descent_phase = mission_data[mission_data.index > burst_time].copy()

print("Maximum ascent rate: {0:0.2f} m/s".format(ascent_phase['ascent_rate_mps'].max()))
print("Average ascent rate: {0:0.2f} m/s".format(ascent_phase['ascent_rate_mps'].mean()))
print()
print("Maximum descent rate: {0:0.2f} m/s".format(descent_phase['ascent_rate_mps'].min()*-1))
print("Average descent rate: {0:0.2f} m/s".format(descent_phase['ascent_rate_mps'].mean()*-1))
print("Ascent rate at landing: {0:0.2f} m/s".format(descent_phase['ascent_rate_mps'][-1]*-1))

Maximum ascent rate: 6.36 m/s
Average ascent rate: 3.53 m/s

Maximum descent rate: 60.84 m/s
Average descent rate: 14.91 m/s
Ascent rate at landing: 4.87 m/s

Other Fun Data¶

Horizontal speed¶

In Graupel-1, our balloon reached horizontal speeds of up to 90 mph. This time, our balloon strolled along a more leisurely pace. This is good, because our ascent rate was lower than anticipated… if it was any faster, we would have had to drive to Idaho.

mission_data['speed_mps'] = mission_data['speed_mps'].ewm(alpha=0.05).mean()

fig, ax = plt.subplots()
annotate_layers(ax, label=True, max_layer=4)

ax.plot(mission_data['speed_mps'], mission_data['altitude_m'])
#ax.plot(descent_phase['speed_mps'], descent_phase['altitude_m'])

ax.set_title("Speed at altitude")
ax.set_xlabel("speed (m/s)")
ax.set_ylabel("altitude (meters)")

mph = units.mile / units.hour
max_speed = mission_data['speed_mps'].max() * units.m / units.s
print(f"Max speed: {max_speed.to(mph):0.2f}")

Max speed: 97.55 mile / hour

From the above data, we can see why our balloon travelled so far east from its predicted flight path. Our original estimated ascent rate was 4.9 m/s, but since we ran out of helium before hitting our target lift, our actual ascent rate averaged only 3.9 m/s. This caused our balloon to drift through the jet stream for a longer period of time, where it was being blown around at a brisk 35 m/s (~80 mph)!

Weather Data¶

fig, ax = plt.subplots()

annotate_layers(ax, label=True, max_layer=4)
ax.plot(ascent_phase['temperature_C'], ascent_phase['altitude_m'], label="ascent")
ax.plot(descent_phase['temperature_C'], descent_phase['altitude_m'], label="descent");
ax.set_title("Temperature at Altitude")
ax.set_ylabel("altitude (m)")
ax.set_xlabel("temperature (C)")

Text(0.5, 0, 'temperature (C)')

# Write the missison data back out to a csv file.
mission_data.to_csv(directory + '/clean-data.csv')