Data visualization allows comprehending numerical data significantly more easily than reading pure tables of numbers. Getting instant insight into data and the identification of patterns, trends, and outliers are the primary uses of charting libraries.
When deciding which stock may be suitable for which algorithmic trading strategy, creating a chart of the stock price is the first step some strategies are suitable only for trending stocks, some for mean-reversion stocks, and so on. While numerical statistics are critical, there is no substitute for a well-designed chart.
Matplotlib supports plotting multiple charts (subplots) on a single figure, which is Matplotlib’s term for the drawing canvas.
![](Aspose.Words.62007344-189a-42e6-8981-e613289df73a.105.png)
Before we plot anything on this figure, we need to add subplots to create space for them. e matplotlib.pyplot.figure.add_subplot(...) method lets us do that by specifying the size of the subplot and the location.
Creating figures and subplots 113
e following code block adds a subplot of size 1x2 grids on the le , then a subplot of 2x2 on the top right, and finally, a subplot of 2x2 on the bottom right:
ax1 = fig.add_subplot(1, 2, 1) ax2 = fig.add_subplot(2, 2, 2) ax3 = fig.add_subplot(2, 2, 4) fig
e result is the following figure object containing the subplots we just added:
Figure 5.1 Figure containing three empty subplots
Now, once we have created the space for the charts ("plots"/"subplots"), we can populate them with visualizations. In all reports, physical space on the page is very expensive, so creating charts like the preceding is the best practice.
Plotting in subplots
Let’s use numpy.linspace(...) to generate evenly spaced values on the x axis, and then the numpy.square(...), numpy.sin(...), and numpy.cos(...) methods to generate corresponding values on the y axis.
114 Data Visualization Using Matplotlib
We will use the ax1, ax2, and ax3 axes variables we got from adding subplots to plot
these functions:
import numpy as np
x = np.linspace(0, 1, num=20)
y1 = np.square(x)
ax1.plot(x, y1, color='black', linestyle='--')
y2 = np.sin(x)
ax2.plot(x, y2, color='black', linestyle=':')
y3 = np.cos(x)
ax3.plot(x, y3, color='black', linestyle='-.') fig
Now, the following figure contains the values we just plotted:
Figure 5.2 Figure containing three subplots plotting the square, sine, and cosine functions
Creating figures and subplots 115
e sharex= parameter can be passed when creating subplots to specify that all the subplots should share the same x axis.
Let’s demonstrate this functionality and plot the square, and then use the numpy. power(...) method to raise x to the power of 10 and plot them with the same x axis:
fig, (ax1, ax2) = plt.subplots(2, figsize=(12, 6), sharex=True) ax1.plot(x, y1, color='black', linestyle='--')
y2 = np.power(x, 10)
ax2.plot(x, y2, color='black', linestyle='-.')
e result is the following figure with a shared x axis and different functions plotted on each graph:
Figure 5.3 Figure with subplots sharing an x axis, containing the square and raised to 10 functions
e charts we generated are not self-explanatory yet it is unclear what the units on the
x axis and the y axis are, and what each chart represents. To improve the charts, we need to enrich them with colors, markers, and line styles, to enrich the axes with ticks, legends, and labels and provide selected data points’ annotations.
116 Data Visualization Using Matplotlib
Enriching plots with colors, markers, and line styles
Colors, markers, and lines styles make charts easier to understand.
e code block that follows plots four different functions and uses the following parameters to modify the appearance:
e color= parameter is used to assign colors.
e linewidth= parameter is used to change the width/thickness of the lines. e marker= parameter assigns different shapes to mark the data points.
e markersize= parameter changes the size of those markers.
e alpha= parameter is used to modify the transparency.
e drawstyle= parameter changes the default line connectivity to step
connectivity between data points for one plot.
e code is as follows:
fig, (ax1, ax2, ax3, ax4) = plt.subplots(4,
figsize=(12, 12), sharex=True)
x = np.linspace(0, 10, num=20)
y1 = np.exp(x)
y2 = x ** 3
y3 = np.sin(y2)
y4 = np.random.randn(20)
ax1.plot(x, y1, color='black', linestyle='--', linewidth=5, marker='x', markersize=15)
ax2.plot(x, y2, color='green', linestyle='-.', linewidth=2, marker='^', markersize=10, alpha=0.9)
ax3.plot(x, y3, color='red', linestyle=':', marker='*',
markersize=15, drawstyle='steps')
ax4.plot(x, y4, color='green', linestyle='-', marker='s', markersize=15)
Enriching plots with colors, markers, and line styles 117
e output displays four functions with different attributes assigned to them:
Figure 5.4 Plot demonstrating different color, line style, marker style, transparency, and size options
Using different colors, line styles, marker styles, transparency, and size options enables us to generate rich charts with easily identifiable multiple time series. Choose the colors wisely as they may not render well on some laptop screens or on paper if printed.
Enriching axes is the next step in making outstanding charts.
118 Data Visualization Using Matplotlib
Enriching axes with ticks, labels, and legends
e charts can be further improved by customizing the axes via ticks, limits, and labels. e matplotlib.pyplot.xlim(...) method sets the range of values on the x axis. e matplotlib.pyplot.xticks(...) method specifies where the ticks show up
on the x axis:
plt.xlim([8, 10.5])
plt.xticks([8, 8.42, 8.94, 9.47, 10, 10.5])
plt.plot(x, y1, color='black', linestyle='--', marker='o')
is modifies the x axis to be within the specified limits and the ticks at the explicitly specified values:
Figure 5.5 Plot with explicit limits and ticks on the x axis
We can also change the scale of one of the axes to non-linear using the matplotlib. Axes.set_yscale(...) method.
e matplotlib.Axes.set_xticklabels(...) method changes the labels on the x axis:
fig, ax = plt.subplots(1, figsize=(12, 6)) ax.set_yscale('log')
ax.set_xticks(x) ax.set_xticklabels(list('ABCDEFGHIJKLMNOPQRSTUV')) ax.plot(x, y1, color='black', linestyle='--', marker='o', label='y=exp(x)')
Enriching axes with ticks, labels, and legends 119
e output of that code block shows the difference in the scale of the y axis, which is now logarithmic, and the x axis ticks have the specific tick labels:
Figure 5.6 Plot with a logarithmic y-axis scale and custom x-axis tick labels
e logarithmic scales in charts are useful if the dataset covers a large range of values and/ or if we want to communicate percentage change or multiplicative factors.
e matplotlib.Axes.set_title(...) method adds a title to the plot and the matplotlib.Axes.set_xlabel(...) and matplotlib.Axes.set_ ylabel(...) methods set labels for the x and y axes.
e matplotlib.Axes.legend(...) method adds a legend, which makes the plots easier to interpret. e loc= parameter specifies the location of the legend on the plot with loc='best', meaning Matplotlib picks the best location automatically:
ax.set_title('xtickslabel example') ax.set_xlabel('x labels') ax.set_ylabel('log scale y values') ax.legend(loc='best')
fig 120 Data Visualization Using Matplotlib
e following plot shows the title, the x- and y-axis labels, and the legend:
Figure 5.7 Plot demonstrating a title, x- and y-axis labels, and a legend
Charts with a different rendering of each time series and with explained units and labels of the axes are sufficient for understanding charts. However, there are always some special data points that would benefit from being pointed out.
Enriching data points with annotations
e matplotlib.Axes.text(...) method adds a text box to our plots:
ax.text(1, 10000, 'Generated using numpy and matplotlib') fig
Enriching data points with annotations 121
e output is as follows:
Figure 5.8 Plot displaying Matplotlib text annotations
e matplotlib.Axes.annotate(...) method provides more control over the annotations.
e code block that follows uses the following parameters to control the annotation:
e xy= parameter specifies the location of the data point.
e xytext= parameter specifies the location of the text box.
e arrowprops= parameter accepts a dictionary specifying parameters to
control the arrow from the text box to the data point.
e facecolor= parameter specifies the color and the shrink= parameter
specifies the size of the arrow.
e horizontalalignment= and verticalalignment= parameters specify
the orientation of the text box relative to the data point.
122 Data Visualization Using Matplotlib
e code is as follows:
for i in [5, 10, 15]:
s = '(x=' + str(x[i]) + ',y=' + str(y1[i]) + ')'
ax.annotate(s, xy=(x[i], y1[i]), xytext=(x[i]+1, y1[i]-5), arrowprops=dict(facecolor='black',
shrink=0.05), horizontalalignment='left', verticalalignment='top')
fig
e result is as follows:
Figure 5.9 Plot with text and arrow annotations of data points
Drawing attention to the key data points helps the reader focus on the message of the chart. e matplotlib.Axes.add_patch(...) method can be used to add different
shape annotations.
e code block that follows adds a matplotlib.pyplot.Circle object, which accepts the following:
e xy= parameter to specify the location
e radius= parameter to specify the circle radius
e color= parameter to specify the color of the circle
Saving plots to files 123
e code is as follows:
fig, ax = plt.subplots(1, figsize=(12, 6))
ax.plot(x, x, linestyle='--', color='black', marker='*', markersize=15)
for val in x:
ax.add_patch(plt.Circle(xy=(val, val), radius=0.3, color='darkgray'))
is generates the following plot with circles around the data points:
Figure 5.10 Plot containing circle annotations around data points generated from adding a patch
Now that we have generated beautiful, professional charts, we need to learn how to share the images.
Saving plots to files
e matplotlib.pyplot.figure object enables us to save plots to disk in different file formats with many size and resolution specifiers, such as the dpi= parameter:
fig.savefig('fig.png', dpi=200)
124 Data Visualization Using Matplotlib
is writes the following plot to the fig.png file:
Figure 5.11 Matplotlib plot written to a file on disk and opened with an external viewer
Exported images of trading strategies’ performance are frequently used for HTML or email reports. For printing, choose the DPI of your printer as the DPI of the charts.
Charting a pandas DataFrame with Matplotlib
e pandas library provides plotting capabilities for Series and DataFrame objects using Matplotlib.
Let’s create a pandas DataFrame with the Cont value containing continuous values that mimic prices and the Delta1 and Delta2 values to mimic price changes. e Cat value contains categorical data from five possibilities:
import pandas as pd
df = pd.DataFrame(index=range(1000),
columns=['Cont value', 'Delta1 value', 'Delta2 value', 'Cat value']) df['Cont value'] = np.random.randn(1000).cumsum()
df['Delta1 value'] = np.random.randn(1000)
df['Delta2 value'] = np.random.randn(1000)
Charting a pandas DataFrame with Matplotlib 125
df['Cat value'] = np.random.permutation(['Very high',
'High', 'Medium', 'Low',
'Very Low']*200) df['Delta1 discrete'] = pd.cut(df['Delta1 value'],
labels=[-2, -1, 0, 1, 2], bins=5).astype(np.int64) df['Delta2 discrete'] = pd.cut(df['Delta2 value'],
labels=[-2, -1, 0, 1, 2], bins=5).astype(np.int64)
df
is generates the following DataFrame:
Cont value Delta1 val Delta2 val Cat value Delta1 discrete Delta2 discrete
0 -1.429618 0.595897 -0.552871 Very high 1 0 1 -0.710593 1.626343 1.123142 Medium 1 1 ... ... ... ... ... ... ... 998 -4.928133 -0.426593 -0.141742 Very high 0 0 999 -5.947680 -0.183414 -0.358367 Medium 0 0 1000 rows × 6 columns
Let’s explore different ways of how this DataFrame can be visualized.
Creating line plots of a DataFrame column
We can plot 'Cont value' in a line plot using the pandas.DataFrame. plot(...) method with the kind= parameter:
df.plot(y='Cont value', kind='line', color='black', linestyle='-', figsize=(12, 6))
126 Data Visualization Using Matplotlib
is command produces the following chart:
Figure 5.12 Line plot generated using the pandas.DataFrame.plot( ) method
Line charts are typically used for displaying time series.
Creating bar plots of a DataFrame column
e pandas.DataFrame.plot(...) method can be used with the kind='bar' parameter to build a bar chart.
Let’s first group the DataFrame by the 'Cat value' value, and then plot the Delta1 discrete value counts in a bar chart:
df.groupby('Cat value')['Delta1 discrete']\ .value_counts().plot(kind='bar', color='darkgray',
title='Occurrence by (Cat,Delta1)', figsize=(12, 6))
Charting a pandas DataFrame with Matplotlib 127
is generates the following plot showing the frequency of (Cat value, Delta1 discrete) value pairs:
Figure 5.13 Vertical bar plot displaying the frequency of (Cat value, Delta1 discrete) value pairs
e kind='barh' parameter builds a horizontal bar plot instead of a vertical one:
df.groupby('Delta2 discrete')['Cat value'].value_counts()\ .plot(kind='barh', color='darkgray',
title='Occurrence by (Delta2,Cat)',
figsize=(12, 12))
128 Data Visualization Using Matplotlib
e output is as follows:
Figure 5.14 Horizontal bar plot displaying the frequency of (Delta2 discrete, Cat value) pairs
Bar plots are most suitable for comparing the magnitude of categorical values.
Creating histogram and density plots of a DataFrame column
e kind='hist' parameter in the pandas.DataFrame.plot(…) method builds a histogram.
Charting a pandas DataFrame with Matplotlib 129
Let’s create a histogram of the Delta1 discrete values:
df['Delta1 discrete'].plot(kind='hist', color='darkgray', figsize=(12, 6), label='Delta1') plt.legend()
e histogram generated is shown:
Figure 5.15 Histogram of Delta1 discrete frequency
We can build a Probability Density Function (PDF ) by specifying the kind='kde' parameter, which generates a PDF using the Kernel Density Estimation (KDE ) of the Delta2 discrete value:
df['Delta2 discrete'].plot(kind='kde', color='black', figsize=(12, 6),
label='Delta2 kde') plt.legend()
130 Data Visualization Using Matplotlib
e output is as follows:
Figure 5.16 KDE plot displaying the PDF of Delta2 discrete values
Histograms and PDFs/KDEs are used to assess the probability distribution of some random variables.
Creating scatter plots of two DataFrame columns
Scatter plots from the pandas.DataFrame.plot(...) method are generated using the kind='scatter' parameter.
e following code block plots a scatter plot between the Delta1 and Delta2 values:
df.plot(kind='scatter', x='Delta1 value', y='Delta2 value', alpha=0.5, color='black', figsize=(8, 8))
Charting a pandas DataFrame with Matplotlib 131
e output is as follows:
Figure 5.17 Scatter plot of the Delta1 value and Delta2 value fields
e pandas.plotting.scatter_matrix(...) method builds a matrix of scatter plots on non-diagonal entries and histogram/KDE plots on the diagonal entries of the matrix between the Delta1 and Delta2 values:
pd.plotting.scatter_matrix(df[['Delta1 value',
'Delta2 value']],
diagonal='kde', color='black', figsize=(8, 8))
132 Data Visualization Using Matplotlib
e output is as follows:
Figure 5.18 Scatter matrix plot of the Delta1 value and Delta2 value fields
Scatter plots/scatter matrices are used to observe relationships between two variables.
Charting a pandas DataFrame with Matplotlib 133
Plotting time series data
e following code block creates a pandas DataFrame containing prices for
two hypothetical trading instruments, A and B. e DataFrame is indexed by the DateTimeIndex objects representing daily dates from 1992 to 2012:
dates = pd.date_range('1992-01-01', '2012-10-22')
time_series = pd.DataFrame(index=dates, columns=['A', 'B']) time_series['A'] = !
np.random.randint(low=-100, high=101,
size=len(dates)).cumsum() + 5000 time_series['B'] = \
np.random.randint(low=-75, high=76,
size=len(dates)).cumsum() + 5000 time_series
e resulting DataFrame is as follows:
A B 1992-01-01 5079 5042 1992-01-02 5088 5047 ... ... ... 2012-10-21 6585 7209 2012-10-22 6634 7247 7601 rows × 2 columns
Let’s use this time series for representative types of plots.
Plotting prices in a line plot
First, let’s plot the daily prices for A and B over 20 years with line plots:
time_series['A'].plot(kind='line', linestyle=' ',
color='black', figsize=(12, 6), label='A')
time_series['B'].plot(kind='line', linestyle='-.',
color='darkgray', figsize=(12, 6), label='B')
plt.legend()
134 Data Visualization Using Matplotlib
e output is as follows:
Figure 5.19 Plot displaying prices for hypothetical instruments A and B over a period of 20 years
While most time series charts are line plots, the additional chart types provide additional insight.
Plotting price change histograms
e usual next stop in financial time series analysis is to inspect changes in price over some duration.
e following code block generates six new fields representing changes in prices over 1 day, 5 days, and 20 days, using the pandas.DataFrame.shift(...) and pandas. DataFrame.fillna(...) methods. We also drop rows with missing data due to the shi and the final DataFrame is saved in the time_series_delta DataFrame:
time_series['A_1_delta'] = !
time_series['A'].shift(-1) – time_series['A'].fillna(0) time_series['B_1_delta'] = \
time_series['B'].shift(-1) – time_series['B'].fillna(0)
time_series['A_5_delta'] = \
time_series['A'].shift(-5) – time_series['A'].fillna(0) time_series['B_5_delta'] = \
Charting a pandas DataFrame with Matplotlib 135
time_series['B'].shift(-5) – time_series['B'].fillna(0)
time_series['A_20_delta'] = \
time_series['A'].shift(-20) – time_series['A'].fillna(0) time_series['B_20_delta'] = \
time_series['B'].shift(-20) – time_series['B'].fillna(0)
time_series_deltas = time_series[['A_1_delta', 'B_1_delta', 'A_5_delta', 'B_5_delta', 'A_20_delta', 'B_20_delta']].dropna() time_series_deltas
e DataFrame contains the following:
A_1_delta B_1_delta A_5_delta B_5_ delta A_20_delta B_20_delta
1992-01-01 9.0 5.0 -49.0 118.0 -249.0 -56.0 1992-01-02 -91.0 69.0 -84.0 123.0 -296.0 -92.0 ... ... ... ... ... ... ... 2012-10-01 88.0 41.0 -40.0 -126.0 -148.0 -84.0 2012-10-02 -10.0 -44.0 -71.0 -172.0 -187.0 -87.0 7581 rows × 6 columns
We can plot the price change histogram for A based on what we have learned in this chapter with the following block of code:
time_series_delt's['A_20_de'ta'].plot(ki'd='h'st', col'r='bl'ck', alpha=0.5, lab'l='A_20_de'ta', figsize=(8,8)) time_series_delt's['A_5_de'ta'].plot(ki'd='h'st', col'r='darkg'ay', alpha=0.5, lab'l='A_5_de'ta', figsize=(8,8))
136 Data Visualization Using Matplotlib
time_series_delt's['A_1_de'ta'].plot(ki'd='h'st', col'r='lightg'ay', alpha=0.5, lab'l='A_1_de'ta', figsize=(8,8))
plt.legend()
e output is as follows:
Figure 5.20 Histogram of A_1, A_5, and A_20 deltas
Charting a pandas DataFrame with Matplotlib 137
Histograms are used for assessing the probability distribution of the underlying data. is particular histogram suggests that the A_20 delta has the greatest variability, which makes sense since the underlying data exhibits a strong trend.
Creating price change density plots
We can also plot the density of price changes using the KDE PDF.
e following code block plots the density function for price changes in B:
time_series_deltas['B_20_delta'].plot(kind='kde', linestyle='-', linewidth=2, color='black', label='B_20_delta', figsize=(8,8)) time_series_deltas['B_5_delta'].plot(kind='kde', linestyle=':', linewidth=2, color='black', label='B_5_delta', figsize=(8,8)) time_series_deltas['B_1_delta'].plot(kind='kde', linestyle='--', linewidth=2, color='black', label='B_1_delta', figsize=(8,8)) plt.legend()
138 Data Visualization Using Matplotlib
e output is as follows:
Figure 5.21 KDE density plot for price changes in B over 1, 5, and 20 days
KDE density plots are very similar to histograms. In contrast to histograms consisting of discrete boxes, KDEs are continuous lines.
Creating box plots by interval
We can group daily prices by different intervals, such as yearly, quarterly, monthly, or weekly, and display the distribution of those prices using box plots.
e following piece of code first uses the pandas.Grouper object with freq='A'
to specify annual periodicity, and then applies to the result the pandas.DataFrame. groupby(…) method to build a pandas.DataFrameGroupBy object. Finally, we call the pandas.DataFrameGroupBy.boxplot(...) method to generate the box plot. We specify the rot=90 parameter to rotate the x-axis tick labels to make it more readable:
group_A = time_series[['A']].groupby(pd.Grouper(freq='A')) group_A.boxplot(color='black', subplots=False, rot=90, figsize=(12,12))
Charting a pandas DataFrame with Matplotlib 139
e output is as follows:
Figure 5.22 Figure containing the box plot distribution of A’s prices grouped by year
140 Data Visualization Using Matplotlib
Box plots with whiskers are used for visualizing groups of numerical data through their corresponding quartiles:
e box’s lower bound corresponds to the lower quartile, while the box’s upper
bound represents the group’s upper quartile.
e line within the box displays the value of the median of the interval. e line below the box ends with the value of the lowest observation.
e line above the box ends with the value of the highest observation.
Creating lag scatter plots
We can visualize the relationships between the different price change variables using the pandas.plotting.scatter_matrix(…) method:
pd.plotting.scatter_matrix(time_series[['A_1_delta', 'A_5_delta', 'A_20_delta', 'B_1_delta', 'B_5_delta', 'B_20_delta']],
diagonal='kde', color='black', alpha=0.25, figsize=(12, 12))
e result shows some linear relationships between the (A_5_Delta and A_1_ Delta), (A_5_Delta and A_20_Delta), (B_1_Delta and B_5_Delta), and (B_5_Delta and B_20_Delta) variable pairs:
Charting a pandas DataFrame with Matplotlib 141
Figure 5.23 Scatter matrix plot for A and B price delta variables
We can also use the pandas.plotting.lag_plot(...) method with different lag= values to specify different levels of lag to generate the scatter plots between prices and lagged prices for A:
fig, (ax1, ax2, ax3) = plt.subplots(3, figsize=(12, 12)) pd.plotting.lag_plot(time_series['A'], ax=ax1, lag=1,
c='black', alpha=0.2)
pd.plotting.lag_plot(time_series['A'], ax=ax2, lag=7,
142 Data Visualization Using Matplotlib
c='black', alpha=0.2) pd.plotting.lag_plot(time_series['A'], ax=ax3, lag=20, c='black', alpha=0.2)
is generates the following three plots for lags of 1, 7, and 20 days:
Figure 5.24 Lag plots for A’s prices with lag values of 1, 7, and 20 days, showing martingale properties
Log plots check whether a time series is random without any trend. For a random time series, its lag plots show no structure. e preceding plots show a clear linear trend; that is, we may succeed in modeling it with an auto-regressive model.
Charting a pandas DataFrame with Matplotlib 143
Creating autocorrelation plots
Autocorrelation plots visualize the relationships with prices at a certain point in time and the prices lagged by a certain number of periods.
We can use the pandas.plotting.autocorrelation_plot(...) method to plot lag values on the x axis and the correlation between price and price lagged by the specified value on the y axis:
fig, ax = plt.subplots(1, figsize=(12, 6)) pd.plotting.autocorrelation_plot(time_series['A'], ax=ax)
We can see that as lag values increase, the autocorrelation slowly deteriorates:
Figure 5.25 Plot displaying the relationship between lag values versus autocorrelation between prices and prices lagged by a specified value
Autocorrelation plots summarize the randomness of a time series. For a random time series, all autocorrelations would be close to 0 for all lags. For a non-random time series, at least one of the autocorrelations would be significantly non-zero.
144 Data Visualization Using Matplotlib
Summary
In this chapter, we have learned how to create visually appealing charts of pandas DataFrames with Matplotlib. While we can calculate many numerical statistics, charts usually offer greater insight more rapidly. You should always plot as many different charts as possible since each provides a different view of the data.
In the next chapter, we will learn how to perform statistical tests and estimate statistical models in Python.
댓글