Errare humanum est: What do RFC Errata say about Internet Standards?
====================================================================

The data and code contained in this directory has been deposited in support of the following paper:

    Errare humanum est: What do RFC Errata say about Internet Standards?
    Stephen McQuistin (University of Glasgow), Mladen Karan (Queen Mary
    University of London), Prashant Khare (Queen Mary University of London),
    Colin Perkins (University of Glasgow), Matthew Purver (Queen Mary University of
    London), Patrick Healey (Queen Mary University of London), Ignacio Castro (Queen Mary
    University of London), and Gareth Tyson (Queen Mary University of London, HKUST)
    Proceedings of the Network Traffic Measurement and Analysis Conference (TMA Conference
    2023), Naples, Italy. June 2023.

    https://eprints.gla.ac.uk/298487/

The tooling provided will collect data from the IETF and RFC Editor's public
repositories, as described in the paper, before processing it, plotting the
results, and generating the paper. The data is gathered from the IETF and RFC
Editor. The gathered datasets include the full IETF mail archive and published
RFC corpus. These consume a significant amount of disk space (~40GB for the full
dataset).

The following dependencies are required to run the tooling and generate the paper:
    * Python 3.9
    * pipenv (tested with version 11.9.0)
    * GNU Make (tested with version 4.2.1)
    * pdfTeX (tested with version 3.14159265-2.6-1.40.20)
    * MongoDB (tested with version 3.6.8)
    * rsync (tested with version 2.6.9)

The tooling makes use of a MongoDB instance to cache the data that it gathers.
This database can be on the same machine as the tooling. If this is the case,
and the server has no access restrictions, then no further setup is required. If
the MongoDB server is on a different machine, or requires authentication, then
environment variables can be set to enable access. These are:
	* IETFDATA_CACHE_HOST (defaults to `localhost`)
	* IETFDATA_CACHE_PORT (defaults to `27017`)
	* IETFDATA_CACHE_USER (optional)
	* IETFDATA_CACHE_PASSWORD (optional)

## Getting started

To begin, run:

    `pipenv install`

This will create a Python virtual environment with the necessary packages installed.

Next, run:

    `pipenv shell make all`

This will take a long time to run (in the order of days), and will produce the
PDF file  `papers/rfc-errata.pdf`, a copy of the paper generated by the tooling.

Note that due to the nature of the tooling, and the datasets it operates on, the
results may vary slightly from those presented in the final version of the paper
(available from the link above). 

## Acknowledgements

This work is an output of the Streamlining and Social Decision Making for
Improved Internet Standards project, funded by the UK Engineering and Physical
Sciences Research Council under grants EP/S036075/1 and EP/S033564/1. Find out
more about the project at https://sodestream.github.io.

## Licence

Unless otherwise stated, the licence contained in the LICENCE file applies to all
code contained in this archive.