Plotting

Overview

Teaching: 15 min
Exercises: 15 min
Questions
  • How can I plot my data?

  • How can I save my plot for publishing?

Objectives
  • Create a time series plot showing a single data set.

  • Create a scatter plot showing relationship between two data sets.

matplotlib is the most widely used scientific plotting library in Python.

import matplotlib.pyplot as plt
time = [0, 1, 2, 3]
position = [0, 100, 200, 300]

fig, ax = plt.subplots()
ax.plot(time, position)
ax.xlabel('Time (hr)')
ax.ylabel('Position (km)')

Simple Position-Time Plot

Display All Open Figures

In our Jupyter Notebook example, running the cell should generate the figure directly below the code. The figure is also included in the Notebook document for future viewing. However, other Python environments like an interactive Python session started from a terminal or a Python script executed via the command line require an additional command to display the figure.

Instruct matplotlib to show a figure:

plt.show()

This command can also be used within a Notebook - for instance, to display multiple figures if several are created by a single cell.

Plot data directly from a Pandas dataframe.

import pandas as pd
import matplotlib.pyplot as plt

data = pd.read_csv("data/Thames_Initiative_2009-2017.csv")
data["datetime"] = pd.to_datetime(data["Sampling Date"] + " " + data["Time of sampling"])
data = data.set_index(data.datetime)

dissolved_cols = [col for col in data.columns if col.startswith("Dissolved")]

site_name = "Thames at Wallingford"

f, ax = plt.subplots(figsize=(12,6))
site = data[data["Site name"] == site_name]
ax = site.loc["2011", dissolved_cols].plot(ax=ax)

ax.set_title(site_name)

By default, DataFrame.plot plots with the rows as the X axis.

Varying plot type

We can vary the plot type…

Here’s an example using the DataFrame convenience plotting

maxima = data.groupby("Site name").max()
maxima["Total Ca (mg/l)"].plot(kind="bar")

One can over plot many datasets with repeated calls to plot functions like scatter, bar, hist, etc.

We can also add a legend by using the label keyword for each dataset and then calling ax.legend()

Here’s an example using the matplotlib figure and axis notation

f, ax = plt.subplots(figsize=(12, 6))
alpha = 0.5
subset = data.iloc[::10, ]
ax.scatter(subset.index, subset.loc[:, "Total Cu (ug/l)"], c="red", label="Copper", alpha=alpha)
ax.scatter(subset.index, subset.loc[:, "Total Zn (ug/l)"], c="grey", label="Zinc", alpha=alpha)
ax.scatter(subset.index, subset.loc[:, "Total Mn (ug/l)"], c="purple", label="Manganese", alpha=alpha)
ax.set_ylim(0, 40)
ax.set_ylabel("Element concentration (ug/l)")
ax.set_xlabel("Time")
ax.legend()

And lastly, here’s another plot type, a histogram

import numpy as np
f, ax = plt.subplots()
ax.hist(data["Mean daily flow (m3/s)"], bins=np.logspace(0, 2, 50))
ax.set_xscale("log")
ax.set_xlabel("Mean daily flow (m3/s)")
ax.set_ylabel("Frequency")

Saving your plot to a file

f.savefig('plot.png')

Key Points

  • matplotlib is the most widely used scientific plotting library in Python.

  • Plot data directly from a Pandas dataframe.

  • Select and transform data, then plot it.

  • Many styles of plot are available: see the Python Graph Gallery for more options.

  • Can plot many sets of data together.