# Jupyter and Python: Data Evaluation and Visualization (Hands-on Session) (EN)

_Track: Research Software and Data with Jupyter and GitLab_

This hands-on session will guide you through your first steps with Jupyter and Python.

## Talk at HeFDI Data Week 2024

Following up on the introductory part, during this session we will go through an exemplary data evaluation and visualization using Python in a JupyterHub environment.

Important notice: The additional text and image files must be located in the same folder as the Jupyter notebooks (.ipynb files) to be loaded and read/displayed.

_Dr. Christian Berger, Philipps-Universität Marburg_

DOI-URL: https://doi.org/10.5281/zenodo.12683967

License: [Creative Commons Attribution (CC-BY) 4.0 International](https://creativecommons.org/licenses/by/4.0/)

## Content

1. Getting started
2. Loading data
3. Visualizing raw data
4. Data processing
5. Visualizing processed data
6. Store results
7. Comparing experimental data with theory

## Link List

- [Website of Project Jupyter](https://jupyter.org/)
- [Project Jupyter Documentation](https://docs.jupyter.org/en/latest/)
- [Jupyter Service at Philipps-Universität Marburg](https://jupyter.uni-marburg.de)
- [Information on Jupyter Service of Philipps-Universität Marburg](https://www.uni-marburg.de/en/hrz/services/interactive-computing)
- [Website of Python](https://www.python.org/)



# 1. Getting started

## Login to JupyterHub

- Open URL: https://jupyterhub-test.online.uni-marburg.de/
- __Choose__ a *Username* and a *Password*
- *Accept* terms and conditions
- *Sign in*

![](login.png)

## Opening a notebook

- Open the notebook by double-clicking *1_data_evaluation_and_visualization.ipynb* in the file browser on the left

![](open.png)

# 2. Loading data

Get source data:

![](Galton_jm.png)

Source: https://de.wikipedia.org/wiki/Datei:Galton_jm.png

Background information: https://en.wikipedia.org/wiki/Galton_board

In [None]:
# Experimental setting

N = 20
p = 0.5
repeats = 417

## Task: Inspect files on system to determine variables

In [None]:
# Prepare variables to read in from data.txt

runs = []

In [None]:
# Importing Python packages

import csv

In [None]:
# Loading source data
with open("data.txt", "r") as read_file:
    data = csv.reader(read_file, delimiter=';')
    for row in data:
        runs.append(int(row[0]))

# 3. Visualizing raw data

## Task: Check format of data in Python

In [None]:
# Solution: Check format of data in Python

runs

In [None]:
# Importing Python packages for plotting

# https://matplotlib.org/
# https://matplotlib.org/cheatsheets/
import matplotlib.pyplot as plt

In [None]:
# Plotting source data

plt.plot(runs)

## Question: How to plot data more correctly?

In [None]:
# Solution: How to plot data more correctly?

plt.plot(runs, '.')

# 4. Data processing

In [None]:
# Importing Python packages for working with tabular data

# https://pandas.pydata.org/
import pandas as pd

## Question: How could this data be processed to give more insight on the experiment?

In [None]:
# Solution: How could this data be processed to give more insight on the experiment?

# Use pandas.Series to easily count the values of all repeats
counts = pd.Series(runs).value_counts()

## Task: Check format of data in Python

In [None]:
# Solution: Check format of data in Python

counts

# 5. Visualizing processed data

## Task: Plot the processed data as bar plot

In [None]:
plt.bar()

In [None]:
# Solution: Plot the processed data as bar plot

plt.bar(counts.index, counts)

## Question: What could be done to improve the plot?

In [None]:
# Solution: What could be done to improve the plot?

fig, ax = plt.subplots()

ax.bar(counts.index, counts)

ax.set_xlim(-0.1*N, 1.1*N)
ax.set_title('Praxis')

plt.show()

# 6. Store results

In [None]:
# Store plots and processed data

fig.savefig('processed_data.png')

counts.sort_index().to_csv('processed_data.csv')

# 7. Comparing experimental data with theory

In [None]:
# Importing Python packages for numerical methods and a scientific library

# https://numpy.org/
import numpy as np

# https://scipy.org/
from scipy.stats import norm

## Theory: Normal distribution

Wikipedia:
> According to the central limit theorem (more specifically, the de Moivre–Laplace theorem), the binomial distribution approximates the normal distribution provided that the number of rows and the number of balls are both large. Varying the rows will result in different standard deviations or widths of the bell-shaped curve or the normal distribution in the bins.

![](normal_distribution.svg)

In [None]:
# Setting the parameters for a normal distribution

mu = N/2
sigma = np.sqrt(N)/2

In [None]:
# Creating a normal distribution

x = np.linspace(0, N, 100)
y = norm.pdf(x, mu, sigma)

In [None]:
# Plotting the normal distribution

fig, ax = plt.subplots()

ax.plot(x, y)

ax.set_xlim(-0.1*N, 1.1*N)
ax.set_title('Theory')

plt.show()

## Task: Create a plot which compares experiment and theory

In [None]:
# Solution: Create a plot which compares experiment and theory

fig, ax = plt.subplots()

ax.bar(counts.index, counts)
ax.plot(x, repeats*norm.pdf(x, mu, sigma),'r')

ax.set_xlim(-0.1*N, 1.1*N)
ax.set_title('Praxis & Theory')

plt.show()

fig.savefig('comparison.png')