Dataset Open Access
Nowadays, a multitude of tracking systems produce massive amounts of maritime data on a daily basis. The most commonly used is the Automatic Identification System (AIS), a collaborative, self-reporting system that allows vessels to broadcast their identification information, characteristics and destination, along with other information originating from on-board devices and sensors, such as location, speed and heading. AIS messages are broadcast periodically and can be received by other vessels equipped with AIS transceivers, as well as by on the ground or satellite-based sensors.
Since becoming obligatory by the International Maritime Organisation (IMO) for vessels above 300 gross tonnage to carry AIS transponders, large datasets are gradually becoming available and are now being considered as a valid method for maritime intelligence .There is now a growing body of literature on methods of exploiting AIS data for safety and optimisation of seafaring, namely traffic analysis, anomaly detection, route extraction and prediction, collision detection, path planning, weather routing, etc., .
As the amount of available AIS data grows to massive scales, researchers are realising that computational techniques must contend with difficulties faced when acquiring, storing, and processing the data. Traditional information systems are incapable of dealing with such firehoses of spatiotemporal data where they are required to ingest thousands of data units per second, while performing sub-second query response times.
Processing streaming data seems to exhibit similar characteristics with other big data challenges, such as handling high data volumes and complex data types. While for many applications, big data batch processing techniques are sufficient, for applications such as navigation and others, timeliness is a top priority; making the right decision steering a vessel away from danger, is only useful if it is a decision made in due time. The true challenge lies in the fact that, in order to satisfy real-time application needs, high velocity, unbounded sized data needs to be processed in constraint, in relation to the data size and finite memory. Research on data streams is gaining attention as a subset of the more generic Big Data research field.
Research on such topics requires an uncompressed unclean dataset similar to what would be collected in real world conditions. This dataset contains all decoded messages collected within a 24h period (starting from 29/02/2020 10PM UTC) from a single receiver located near the port of Piraeus (Greece). All vessels identifiers such as IMO and MMSI have been anonymised and no down-sampling procedure, filtering or cleaning has been applied.
The schema of the dataset is provided below:
· t: the time at which the message was received (UTC)
· shipid: the anonymized id of the ship
· lon: the longitude of the current ship position
· lat: the latitude of the current ship position
· heading: (see: https://en.wikipedia.org/wiki/Course_(navigation))
· course: the direction in which the ship moves (see: https://en.wikipedia.org/wiki/Course_(navigation))
· speed: the speed of the ship (measured in knots)
· shiptype: AIS reported ship-type
· destination: AIS reported destination