A Recent History of Python (Through Stack Overflow Questions and Answers)

Sam Hames

Who’s Used Stack Overflow Before?

The Prominence of Stack Overflow

“We serve about 100 million monthly visitors worldwide, making us one of the most popular websites in the world. I think we are in the top 50 of all websites in the world by traffic. Over the past 14 years, the site’s been accessed about 50 billion times.”

Prashanth Chandrasekar, Stack Exchange CEO to ZDNET

What Stack Overflow Isn’t

*see also

Opportunity: Stack Overflow Data Is Open

Stack Overflow launched late 2008: 15 years of questions and answers.

User submitted data is all licensed with a version of CC-BY-SA, and is archived for bulk download across all Stack Exchange sites.

How can we make use of this to better understand Python’s history?

Understanding Stack Overflow

For the rest of this talk I’ll be using the June 2023 dump from Stack Overflow.

Posts.xml - 58,665,485 questions and answers in 95GiB of uncompressed XML.

The earliest question tagged with Python.

Posts Over Time

Python Tag Over Time

That’s Nice, But…

there are 58 million posts!

Solution Is Obviously To Start From Scratch And Reinvent All Of The Wheels

https://github.com/SamHames/hyperreal

It’s not just open source, it’s academic open source!

Demo

Local demo

Does This Work? Yes.

Does This Work? Yes, But…

Let’s Get To The Python!

Warmup - Q+A on Stack Overflow

Asking Questions 1

Asking Questions 2

Asking Questions 3

Fundamentals

Fundamentals - Strings and Textual Types

Fundamentals - Files and File Systems

Fundamentals - Variables

Fundamentals - Iteration

Fundamentals - Arguments

Fundamentals - Builtin Types

Fundamentals - Dates and Times

Fundamentals - Classes

From Scientific Computing to Data Science

Pandas Pandas Pandas

Machine Learning in General

Deep Learning Toolkits

Plotting and Visualisation

Django and Web Development

Scripting

Notebooks

Summary

DIY (and Tell Me What Breaks)

Recreate this analysis with this script (you’ll need ~300GiB spare disk space and 8-12 hours compute + download time):

https://github.com/SamHames/hyperreal/tree/main/examples/stackoverflow

Find me on the Fediverse:

https://cloudisland.nz/@sam_hames