Storytelling Data Visualization on Exchange Rates¶
In this project, we will focus on explanatory data visualization and practice the following:
- How to use information design principles (familiarity and maximizing the data-ink ratio) to create better graphs for an audience.
- About the elements of a story and how to create storytelling data visualizations using
Matplotlib. - How to cretate visual patterns using Gestalt principles.
- How to guide the audience's attention with pre-attentive attributes.
- How to use Matplotlib built-in sytles — with a case study on the FiveThirtyEight style.
Introducing the Dataset¶
The dataset we'll use describes Euro daily exhcange rates between 1999 and 2021. The euro (symbolized with €) is the official currency in most of the countries of the European Union.
If the exchange rate of the euro to the US dollar is 1.5, you get 1.5 US dollars if you pay 1.0 euro (one euro has more value than one US dollar at this exchange rate).
Daria Chemkaeva put together the data set and made it available on Kaggle — the data source is the European Central Bank. Note that the dataset gets regualr updates — the one in this project is downloaded on March 2023.
Let's start by reading in the dataset.
import pandas as pd
exchange_rates = pd.read_csv('euro-daily-hist_1999_2022.csv')
exchange_rates.head()
| Period\Unit: | [Australian dollar ] | [Bulgarian lev ] | [Brazilian real ] | [Canadian dollar ] | [Swiss franc ] | [Chinese yuan renminbi ] | [Cypriot pound ] | [Czech koruna ] | [Danish krone ] | ... | [Romanian leu ] | [Russian rouble ] | [Swedish krona ] | [Singapore dollar ] | [Slovenian tolar ] | [Slovak koruna ] | [Thai baht ] | [Turkish lira ] | [US dollar ] | [South African rand ] | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2023-01-27 | 1.5289 | 1.9558 | 5.5104 | 1.4479 | 1.0017 | 7.3690 | NaN | 23.826 | 7.4378 | ... | 4.8965 | NaN | 11.2108 | 1.4277 | NaN | NaN | 35.7020 | 20.4365 | 1.0865 | 18.7185 |
| 1 | 2023-01-26 | 1.5308 | 1.9558 | 5.5572 | 1.4568 | 1.0002 | 7.3893 | NaN | 23.818 | 7.4383 | ... | 4.8818 | NaN | 11.1763 | 1.4292 | NaN | NaN | 35.6870 | 20.4961 | 1.0895 | 18.6127 |
| 2 | 2023-01-25 | 1.5360 | 1.9558 | 5.5690 | 1.4544 | 1.0020 | 7.3778 | NaN | 23.808 | 7.4381 | ... | 4.9035 | NaN | 11.1335 | 1.4307 | NaN | NaN | 35.7180 | 20.4658 | 1.0878 | 18.6745 |
| 3 | 2023-01-24 | 1.5470 | 1.9558 | 5.6164 | 1.4517 | 1.0053 | 7.3642 | NaN | 23.874 | 7.4399 | ... | 4.9171 | NaN | 11.0995 | 1.4337 | NaN | NaN | 35.6090 | 20.4234 | 1.0858 | 18.7687 |
| 4 | 2023-01-23 | 1.5529 | 1.9558 | 5.6372 | 1.4523 | 1.0013 | 7.3730 | NaN | 23.881 | 7.4393 | ... | 4.9202 | NaN | 11.1183 | 1.4328 | NaN | NaN | 35.6300 | 20.4478 | 1.0871 | 18.6597 |
5 rows × 41 columns
exchange_rates.tail()
| Period\Unit: | [Australian dollar ] | [Bulgarian lev ] | [Brazilian real ] | [Canadian dollar ] | [Swiss franc ] | [Chinese yuan renminbi ] | [Cypriot pound ] | [Czech koruna ] | [Danish krone ] | ... | [Romanian leu ] | [Russian rouble ] | [Swedish krona ] | [Singapore dollar ] | [Slovenian tolar ] | [Slovak koruna ] | [Thai baht ] | [Turkish lira ] | [US dollar ] | [South African rand ] | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 6224 | 1999-01-08 | 1.8406 | NaN | NaN | 1.7643 | 1.6138 | NaN | 0.58187 | 34.938 | 7.4433 | ... | 1.3143 | 27.2075 | 9.1650 | 1.9537 | 188.8400 | 42.560 | 42.5590 | 0.3718 | 1.1659 | 6.7855 |
| 6225 | 1999-01-07 | 1.8474 | NaN | NaN | 1.7602 | 1.6165 | NaN | 0.58187 | 34.886 | 7.4431 | ... | 1.3092 | 26.9876 | 9.1800 | 1.9436 | 188.8000 | 42.765 | 42.1678 | 0.3701 | 1.1632 | 6.8283 |
| 6226 | 1999-01-06 | 1.8820 | NaN | NaN | 1.7711 | 1.6116 | NaN | 0.58200 | 34.850 | 7.4452 | ... | 1.3168 | 27.4315 | 9.3050 | 1.9699 | 188.7000 | 42.778 | 42.6949 | 0.3722 | 1.1743 | 6.7307 |
| 6227 | 1999-01-05 | 1.8944 | NaN | NaN | 1.7965 | 1.6123 | NaN | 0.58230 | 34.917 | 7.4495 | ... | 1.3168 | 26.5876 | 9.4025 | 1.9655 | 188.7750 | 42.848 | 42.5048 | 0.3728 | 1.1790 | 6.7975 |
| 6228 | 1999-01-04 | 1.9100 | NaN | NaN | 1.8004 | 1.6168 | NaN | 0.58231 | 35.107 | 7.4501 | ... | 1.3111 | 25.2875 | 9.4696 | 1.9554 | 189.0450 | 42.991 | 42.6799 | 0.3723 | 1.1789 | 6.9358 |
5 rows × 41 columns
exchange_rates.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 6229 entries, 0 to 6228 Data columns (total 41 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Period\Unit: 6229 non-null object 1 [Australian dollar ] 6229 non-null object 2 [Bulgarian lev ] 5827 non-null object 3 [Brazilian real ] 5961 non-null object 4 [Canadian dollar ] 6229 non-null object 5 [Swiss franc ] 6229 non-null object 6 [Chinese yuan renminbi ] 5961 non-null object 7 [Cypriot pound ] 2346 non-null object 8 [Czech koruna ] 6229 non-null object 9 [Danish krone ] 6229 non-null object 10 [Estonian kroon ] 3130 non-null object 11 [UK pound sterling ] 6229 non-null object 12 [Greek drachma ] 520 non-null object 13 [Hong Kong dollar ] 6229 non-null object 14 [Croatian kuna ] 5941 non-null object 15 [Hungarian forint ] 6229 non-null object 16 [Indonesian rupiah ] 6229 non-null object 17 [Israeli shekel ] 5961 non-null object 18 [Indian rupee ] 5961 non-null object 19 [Iceland krona ] 3822 non-null float64 20 [Japanese yen ] 6229 non-null object 21 [Korean won ] 6229 non-null object 22 [Lithuanian litas ] 4159 non-null object 23 [Latvian lats ] 3904 non-null object 24 [Maltese lira ] 2346 non-null object 25 [Mexican peso ] 6229 non-null object 26 [Malaysian ringgit ] 6229 non-null object 27 [Norwegian krone ] 6229 non-null object 28 [New Zealand dollar ] 6229 non-null object 29 [Philippine peso ] 6229 non-null object 30 [Polish zloty ] 6229 non-null object 31 [Romanian leu ] 6167 non-null float64 32 [Russian rouble ] 5994 non-null object 33 [Swedish krona ] 6229 non-null object 34 [Singapore dollar ] 6229 non-null object 35 [Slovenian tolar ] 2085 non-null object 36 [Slovak koruna ] 2608 non-null object 37 [Thai baht ] 6229 non-null object 38 [Turkish lira ] 6167 non-null float64 39 [US dollar ] 6229 non-null object 40 [South African rand ] 6229 non-null object dtypes: float64(3), object(38) memory usage: 1.9+ MB
Data Cleaning¶
Before we start creating data visualizations, we'll need to do a bit of data cleaning. Our focus in this porject will be on the exchange rate between the euro and the American dollar. Below we do the following:
- Rename the
[US dollar]andPeriod\Unit:columns to something easier to type —US_dollarandTime. - Change the
Timecolumn to adatetimedata type. - Sort the values by
Timein ascending order. - Reset the index (and drop the initial index).
- Change the 'US_dollar' column to
floatdata type.
exchange_rates.rename(columns={'[US dollar ]': 'US_dollar',
'Period\\Unit:': 'Time'},
inplace=True)
exchange_rates['Time'] = pd.to_datetime(exchange_rates['Time'])
exchange_rates.sort_values('Time', inplace=True)
exchange_rates.reset_index(drop=True, inplace=True)
exchange_rates.head()
| Time | [Australian dollar ] | [Bulgarian lev ] | [Brazilian real ] | [Canadian dollar ] | [Swiss franc ] | [Chinese yuan renminbi ] | [Cypriot pound ] | [Czech koruna ] | [Danish krone ] | ... | [Romanian leu ] | [Russian rouble ] | [Swedish krona ] | [Singapore dollar ] | [Slovenian tolar ] | [Slovak koruna ] | [Thai baht ] | [Turkish lira ] | US_dollar | [South African rand ] | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1999-01-04 | 1.9100 | NaN | NaN | 1.8004 | 1.6168 | NaN | 0.58231 | 35.107 | 7.4501 | ... | 1.3111 | 25.2875 | 9.4696 | 1.9554 | 189.0450 | 42.991 | 42.6799 | 0.3723 | 1.1789 | 6.9358 |
| 1 | 1999-01-05 | 1.8944 | NaN | NaN | 1.7965 | 1.6123 | NaN | 0.58230 | 34.917 | 7.4495 | ... | 1.3168 | 26.5876 | 9.4025 | 1.9655 | 188.7750 | 42.848 | 42.5048 | 0.3728 | 1.1790 | 6.7975 |
| 2 | 1999-01-06 | 1.8820 | NaN | NaN | 1.7711 | 1.6116 | NaN | 0.58200 | 34.850 | 7.4452 | ... | 1.3168 | 27.4315 | 9.3050 | 1.9699 | 188.7000 | 42.778 | 42.6949 | 0.3722 | 1.1743 | 6.7307 |
| 3 | 1999-01-07 | 1.8474 | NaN | NaN | 1.7602 | 1.6165 | NaN | 0.58187 | 34.886 | 7.4431 | ... | 1.3092 | 26.9876 | 9.1800 | 1.9436 | 188.8000 | 42.765 | 42.1678 | 0.3701 | 1.1632 | 6.8283 |
| 4 | 1999-01-08 | 1.8406 | NaN | NaN | 1.7643 | 1.6138 | NaN | 0.58187 | 34.938 | 7.4433 | ... | 1.3143 | 27.2075 | 9.1650 | 1.9537 | 188.8400 | 42.560 | 42.5590 | 0.3718 | 1.1659 | 6.7855 |
5 rows × 41 columns
In this project, we'll focus on Euro to US dollar exchange rate.
euro_to_dollar = exchange_rates[['Time', 'US_dollar']].copy()
euro_to_dollar['US_dollar'].value_counts()
US_dollar
- 62
1.2276 9
1.1215 8
1.1268 7
1.1305 7
..
1.3818 1
1.3591 1
1.3405 1
1.3580 1
1.0865 1
Name: count, Length: 3718, dtype: int64
There are 62 '-' characters in the US_dollar column, which need to be removed as invalid value.
euro_to_dollar = euro_to_dollar[euro_to_dollar['US_dollar'] != '-']
euro_to_dollar['US_dollar'] = euro_to_dollar['US_dollar'].astype(float)
Rolling Mean¶
Now that we're finished cleaning the data, we'll generate a line plot to visualize the evolution of the euro-dollar exhange rate.
import matplotlib.pyplot as plt
%matplotlib inline
plt.plot(euro_to_dollar['Time'], euro_to_dollar['US_dollar'])
plt.show()
If we look at the line's shape, we see many small wiggles — rather than seeing a smooth line. The wiggles, however, have meaning: : they are the visual representation of the daily variation in the exchange rate. The rate goes up and down, up and down again, day to day. The rate only shows clear upward or downward trends in the longer run (months or years).
Depending on our goals, we may not want to show that daily variation on our graph. If we want to hide it and show only the long-term trends, we can use the rolling mean (also known as the moving average). Here we apply different moving windows for the visualization.
plt.figure(figsize=(9, 6))
plt.subplot(3,2,1)
plt.plot(euro_to_dollar['Time'], euro_to_dollar['US_dollar'])
plt.title('Roling Window = 0 Days', weight='bold')
windows = [7, 30, 50, 100, 365]
for i in range(5):
plt.subplot(3,2,i+2)
plt.plot(euro_to_dollar['Time'],
euro_to_dollar['US_dollar'].rolling(windows[i]).mean())
plt.title('Rolling Window = ' + str(windows[i]) + ' Days', weight='bold')
plt.tight_layout() # Auto-adjusts the padding between subplots
plt.show()
Let's choose the rolling window of 30 days for the following analysis.
euro_to_dollar['rolling_mean'] = euro_to_dollar['US_dollar'].rolling(30).mean()
euro_to_dollar
| Time | US_dollar | rolling_mean | |
|---|---|---|---|
| 0 | 1999-01-04 | 1.1789 | NaN |
| 1 | 1999-01-05 | 1.1790 | NaN |
| 2 | 1999-01-06 | 1.1743 | NaN |
| 3 | 1999-01-07 | 1.1632 | NaN |
| 4 | 1999-01-08 | 1.1659 | NaN |
| ... | ... | ... | ... |
| 6224 | 2023-01-23 | 1.0871 | 1.067830 |
| 6225 | 2023-01-24 | 1.0858 | 1.068817 |
| 6226 | 2023-01-25 | 1.0878 | 1.069927 |
| 6227 | 2023-01-26 | 1.0895 | 1.070747 |
| 6228 | 2023-01-27 | 1.0865 | 1.071560 |
6167 rows × 3 columns
Coming Up With an Idea¶
To create a story, remember that we need to arrange our data into a series of events that show change. Here are two story ideas for our data:
- Show how the euro-dollar rate changed during the 2007-2008's financial crisis. We can also show the data for 2016 and 2009 for comparison. We can use a line plot.
- Show comparatively how the euro-dollar rate changed under the last three US presidents (George W. Bush (2001-2009), Barack Obama (2009-2017), and Donald Trump (2017-2021)). We can use a line plot.
Financial Crisis¶
financial_crisis = euro_to_dollar.copy(
)[(euro_to_dollar['Time'].dt.year >= 2006
) & (euro_to_dollar['Time'].dt.year <= 2009)]
financial_crisis_7_8 = euro_to_dollar.copy(
)[(euro_to_dollar.Time.dt.year >= 2007
) & (euro_to_dollar.Time.dt.year <= 2008)]
### Adding the FiveThirtyEight style
import matplotlib.style as style
import warnings
warnings.filterwarnings("ignore")
style.use('fivethirtyeight')
### Adding the plot
fig, ax = plt.subplots(figsize=(8, 3))
ax.plot(financial_crisis['Time'],
financial_crisis['rolling_mean'],
linewidth=1, color='#A6D785')
### Highlighting the 2007-2008 period
ax.plot(financial_crisis_7_8['Time'],
financial_crisis_7_8['rolling_mean'],
linewidth=3, color='#e23d28')
### Highlighting the peak of the crisis
ax.axvspan(xmin=13960, xmax=14140, ymin=0.09,
alpha=0.3, color='grey')
### Adding separate tick labels
ax.set_xticklabels([])
ax.set_yticklabels([])
x = 13110
for year in ['2006', '2007', '2008', '2009', '2010']:
ax.text(x, 1.13, year, alpha=0.5, fontsize=11)
x += 365
y = 1.193
for rate in ['1.2', '1.3', '1.4', '1.5']:
ax.text(13010, y, rate, alpha=0.5, fontsize=11)
y += 0.1
### Adding a title and a subtitle
ax.text(13010, 1.67, "Euro-USD rate peaked at 1.59 during 2007-2008's finacial crisis",
weight='bold')
ax.text(13010, 1.63, 'Eoro-USD exchange rates between 2006 and 2010',
size=12)
### Adding a signature
ax.text(13010, 1.07, 'Esteban Rosal' + ' '*108 + 'Source: European Central Bank',
color='#f0f0f0', backgroundcolor='#4d4d4d', size=10)
### Add some Transparency to the grid
ax.grid(alpha=0.5)
plt.show()
The Three US Presidencies¶
bush_obama_trump = euro_to_dollar.copy(
)[(euro_to_dollar['Time'].dt.year >= 2001) & (euro_to_dollar['Time'].dt.year < 2021)]
bush = bush_obama_trump.copy(
)[bush_obama_trump['Time'].dt.year < 2009]
obama = bush_obama_trump.copy(
)[(bush_obama_trump['Time'].dt.year >= 2009) & (bush_obama_trump['Time'].dt.year < 2017)]
trump = bush_obama_trump.copy(
)[(bush_obama_trump['Time'].dt.year >= 2017) & (bush_obama_trump['Time'].dt.year < 2021)]
Below, you'll notice we used matplotlib's functional approach to build the graphs. We use this approach because it offers more flexibility in arranging the subplots:
- First build three of the graphs on a 2-by-3 grid (this grid should have six subplots, but only build three; the bottom row remains empty).
- Then build only the bottom graph of a 2-by-1 grid (this grid should have two subplots; the top row remains empty).
- The two grids are merged, and end up with three graphs on the top row and one graph on the bottom row.
### Adding the FiveThirtyEight style
style.use('fivethirtyeight')
### Adding the subplots
plt.figure(figsize=(12, 6))
ax1 = plt.subplot(2,3,1)
ax2 = plt.subplot(2,3,2)
ax3 = plt.subplot(2,3,3)
ax4 = plt.subplot(2,1,2)
axes = [ax1, ax2, ax3, ax4]
### Changes to all the subplots
for ax in axes:
ax.set_ylim(0.8, 1.7)
ax.set_yticks([1.0, 1.2, 1.4, 1.6])
ax.set_yticklabels(['1.0', '1.2','1.4', '1.6'],
alpha=0.3)
ax.grid(alpha=0.5)
### Ax1: Bush
ax1.plot(bush['Time'], bush['rolling_mean'],
color='#BF5FFF')
ax1.set_xticklabels(['', '2001', '', '2003', '', '2005', '',
'2007', '', '2009'],
alpha=0.3)
ax1.text(12210.0, 1.92, 'BUSH', fontsize=18, weight='bold',
color='#BF5FFF')
ax1.text(11910.0, 1.8, '(2001-2009)', weight='bold',
alpha=0.3)
### Ax2: Obama
ax2.plot(obama['Time'], obama['rolling_mean'],
color='#ffa500')
ax2.set_xticklabels(['', '2009', '', '2011', '', '2013', '',
'2015', '', '2017'],
alpha=0.3)
ax2.text(15050.0, 1.92, 'OBAMA', fontsize=18, weight='bold',
color='#ffa500')
ax2.text(14920.0, 1.8, '(2009-2017)', weight='bold',
alpha=0.3)
### Ax3: Trump
ax3.plot(trump['Time'], trump['rolling_mean'],
color='#00B2EE')
ax3.set_xticklabels(['2017', '', '2018', '', '2019', '',
'2020', '', '2021'],
alpha=0.3)
ax3.text(17700.0, 1.92, 'TRUMP', fontsize=18, weight='bold',
color='#00B2EE')
ax3.text(17620.0, 1.8, '(2017-2021)', weight='bold',
alpha=0.3)
### Ax4: Bush-Obama-Trump
ax4.plot(bush['Time'], bush['rolling_mean'],
color='#BF5FFF')
ax4.plot(obama['Time'], obama['rolling_mean'],
color='#ffa500')
ax4.plot(trump['Time'], trump['rolling_mean'],
color='#00B2EE')
ax4.grid(alpha=0.5)
ax4.set_xticks([])
### Adding a title and a subtitle
ax1.text(10800.0, 2.35, 'EURO-USD rate averaged 1.22 under the last three US presidents',
fontsize=20, weight='bold')
ax1.text(10800.0, 2.14, '''EURO-USD exchange rates under George W. Bush (2001 - 2009), Barack Obama (2009-2017),
and Donald Trump (2017-2021)''',
fontsize=16)
### Adding a signature
ax4.text(10800.0, 0.65, 'Esteban Rosal' + ' '*103 + 'Source: European Central Bank',
color = '#f0f0f0', backgroundcolor = '#4d4d4d',
size=14)
plt.show()