Plotting
Overview
Teaching: 15 min
Exercises: 15 minQuestions
How can I plot my data?
How can I save my plot for publishing?
Objectives
Create a time series plot showing a single data set.
Create a scatter plot showing relationship between two data sets.
matplotlib
is the most widely used scientific plotting library in Python.
- Commonly use a sub-library called
matplotlib.pyplot
. - The Jupyter Notebook will render plots inline by default.
import matplotlib.pyplot as plt
- Simple plots are then (fairly) simple to create.
time = [0, 1, 2, 3]
position = [0, 100, 200, 300]
fig, ax = plt.subplots()
ax.plot(time, position)
ax.xlabel('Time (hr)')
ax.ylabel('Position (km)')
Display All Open Figures
In our Jupyter Notebook example, running the cell should generate the figure directly below the code. The figure is also included in the Notebook document for future viewing. However, other Python environments like an interactive Python session started from a terminal or a Python script executed via the command line require an additional command to display the figure.
Instruct
matplotlib
to show a figure:plt.show()
This command can also be used within a Notebook - for instance, to display multiple figures if several are created by a single cell.
Plot data directly from a Pandas dataframe
.
- We can also plot Pandas dataframes.
- This implicitly uses
matplotlib.pyplot
. - First, process the data to create a datetime index
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv("data/Thames_Initiative_2009-2017.csv")
data["datetime"] = pd.to_datetime(data["Sampling Date"] + " " + data["Time of sampling"])
data = data.set_index(data.datetime)
dissolved_cols = [col for col in data.columns if col.startswith("Dissolved")]
site_name = "Thames at Wallingford"
f, ax = plt.subplots(figsize=(12,6))
site = data[data["Site name"] == site_name]
ax = site.loc["2011", dissolved_cols].plot(ax=ax)
ax.set_title(site_name)
By default, DataFrame.plot
plots with the rows as the X axis.
Varying plot type
We can vary the plot type…
Here’s an example using the DataFrame convenience plotting
maxima = data.groupby("Site name").max()
maxima["Total Ca (mg/l)"].plot(kind="bar")
One can over plot many datasets with repeated calls to plot functions like scatter, bar, hist, etc.
We can also add a legend by using the label
keyword for each dataset and then calling ax.legend()
Here’s an example using the matplotlib figure and axis notation
f, ax = plt.subplots(figsize=(12, 6))
alpha = 0.5
subset = data.iloc[::10, ]
ax.scatter(subset.index, subset.loc[:, "Total Cu (ug/l)"], c="red", label="Copper", alpha=alpha)
ax.scatter(subset.index, subset.loc[:, "Total Zn (ug/l)"], c="grey", label="Zinc", alpha=alpha)
ax.scatter(subset.index, subset.loc[:, "Total Mn (ug/l)"], c="purple", label="Manganese", alpha=alpha)
ax.set_ylim(0, 40)
ax.set_ylabel("Element concentration (ug/l)")
ax.set_xlabel("Time")
ax.legend()
And lastly, here’s another plot type, a histogram
import numpy as np
f, ax = plt.subplots()
ax.hist(data["Mean daily flow (m3/s)"], bins=np.logspace(0, 2, 50))
ax.set_xscale("log")
ax.set_xlabel("Mean daily flow (m3/s)")
ax.set_ylabel("Frequency")
Saving your plot to a file
f.savefig('plot.png')
Key Points
matplotlib
is the most widely used scientific plotting library in Python.Plot data directly from a Pandas dataframe.
Select and transform data, then plot it.
Many styles of plot are available: see the Python Graph Gallery for more options.
Can plot many sets of data together.