Getting Out The Data - Fighting The Latency Dragon
Authors/Creators
Description
Earth observation data has been archived in DLR for several decades, including data from several generations of satellite missions and different kinds of sensors. Combining information from different ages and sources allows scientists to "put the data to work" by extracting valuable higher level information out of that data.
This requires the data to be available at computing facilities at just the right time. With computing power and the amount of data growing ever more, the challenge of getting the data to the user or processing systems quicker is getting more urgent all the time.
Traditionally, data has been archived using tape. This still has some big advantages over disk concerning cost, energy and reliability of the storage media. The big drawback for tape is the latency when accessing data, due to tape mounting and spooling times.
To work around that latency in our archive, projects using large amounts of historical data, such as the DLR TIMELINE project, have been building their processes to optimize data handling and processing to the existing infrastructure. This approach works well where data needs are well known and requires additional effort to adjust the processing chains.
For serving more general needs for fast data access and usage, some archives are now switching to all disk installations, while for larger amounts of data, the newer data that is being accessed more frequently is being made available in online rolling archives or exploitation platforms. The downside of this approach are higher cost and energy usage, both increasing with the amount of data being kept on disk. So it would be desirable to limit online disk space used and be able to quickly reload data.
DLR has evaluated and implemented a solution that tries to address the latency problem within the archive without giving up the advantages usually associated with tape. The storage media is disk based for quick random access, uses a checksum scheme for better redundancy and reliability, and spins down and powers off unused disks for the reduction of energy usage and cooling requirements. Integrating it with our existing hierarchical storage management enables efficient control of the internal disk power management.
This talk will describe the challenges and the solution implemented, and also report on the experiences with the installation that is integrated into our production environment.
Files
Poster_PV2023_5451_DLR_Wehn_Latency.pdf
Files
(1.5 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:bf1abaee953b4df4e5d4588f8e511cbf
|
1.5 MB | Preview Download |