Published April 21, 2025 | Version v2
Dataset Open

Quantitative Content Analysis Data for Hand Labeling Road Surface Conditions in New York State Department of Transportation Camera Images

  • 1. University at Albany, State University of New York
  • 2. Atmospheric Sciences Research Center, University at Albany - SUNY
  • 3. UAlbany Center of Excellence
  • 4. National Center for Atmospheric Research
  • 5. Cooperative Institute for Research in the Atmosphere (CIRA)

Description

Foundational Codebook and Data: 

Traffic camera images from the New York State Department of Transportation (511ny.org) are used to create a hand-labeled dataset of images classified into to one of six road surface conditions: 1) severe snow, 2) snow, 3) wet, 4) dry, 5) poor visibility, or 6) obstructed. Six labelers (authors Sutter, Wirz, Przybylo, Cains, Radford, and Evans) went through a series of four labeling trials where reliability across all six labelers were assessed using the Krippendorff’s alpha (KA) metric (Krippendorff, 2007). The online tool by Dr. Freelon (Freelon, 2013; Freelon, 2010) was used to calculate reliability metrics after each trial, and the group achieved inter-coder reliability with KA of 0.888 on the 4th trial. This process is known as quantitative content analysis, and three pieces of data used in this process are shared, including: 1) a PDF of the codebook which serves as a set of rules for labeling images, 2) images from each of the four labeling trials, including the use of New York State Mesonet weather observation data (Brotzge et al., 2020), and 3) an Excel spreadsheet including the calculated inter-coder reliability (ICR) metrics and other summaries used to asses reliability after each trial. The data are included in NYSDOT_quantitative_content_analysis.zip.

The broader purpose of this work is that the six human labelers, after achieving inter-coder reliability, can then label large sets of images independently, each contributing to the creation of larger labeled dataset used for training supervised machine learning models to predict road surface conditions from camera images. The xCITE lab (xCITE, 2023) is used to store camera images from 511ny.org, and the lab provides computing resources for training machine learning models.

Obstructed Class Variation:

There are many applications for labeling roadside camera images, and as a variation of the foundational codebook, an addendum codebook provides another version of labeling the obstructed class. Specifically, this variation prioritizes labeling an image as “obstructed” only in extreme circumstances where there is a camera- or image- specific problem that prevents the assessment of any road surfaces. For labelers who want to use this version of the obstructed class (in this document) and also the other five weather-related classes (in the foundational codebook), the guidance is to use both documents in tandem, making sure to use the obstructed rules/definitions in this document while disregarding the obstructed rules/definitions in the foundational codebook. Alternatively, this codebook may be used alone in applications where the goal is to solely classify obstructed vs not obstructed. To ensure reliability and quality of this variation, quantitative content analysis was conducted on this addendum codebook, just as it was for the foundational codebook. Two labelers were tested with a sample of 30 images and achieved inter-coder reliability with Krippendorff's Alpha of 0.934 after one trial. The data, including the addendum codebook and labeling trial data (images and results) are included in ObstructedVariation_quantitative_content_analysis.zip.

This material is based upon work supported by the U.S. National Science Foundation under Grant No. RISE-2019758.

Files

NYSDOT_quantitative_content_analysis.zip

Files (76.1 MB)

Name Size Download all
md5:fc7924e225393f732cc9b2545a034ebe
60.8 MB Preview Download
md5:e08db80d3be56d1876de0d738fe41e08
15.2 MB Preview Download

Additional details

Funding

U.S. National Science Foundation
AI Institute: Artificial Intelligence for Environmental Sciences (AI2ES) 2019758

References