Published April 21, 2023 | Version 1.0.0
Dataset Open

ADASIND: A Diverse Wide Angle Fisheye Camera Dataset for Autonomous Driving

  • 1. National Institute of Technology, Silchar

Description

This dataset comprises images collected from a fisheye lens on Indian roads to capture a variety of driving scenarios in a real-world environment, with a focus on semi-urban roads that don't have dividers.. The dataset includes 10,000 RGB images and has well labeled annotations for 2d bounding box, moving object detection, calibration of the fisheye lens containing parameters, ground truth of the RGB images and previous image of the corresponding frames. With this dataset, we would like to encourage the community to adapt computer vision models for the fisheye camera instead of using naive rectification. 

The Adasind dataset was collected by recording a video of more than 10,000 seconds as a vehicle drove across the Silchar city, starting from the main gate of the National Institute of Technology Silchar during different time periods of the day considering traffic variations. To capture the video, we used a smartphone with a 64-megapixel camera, which was attached to a fisheye lens and a tripod stand. The smartphone was firmly held from the backside of the motor vehicle, ensuring that it was stable and wouldn't move during the recording process. The vehicle drove through the city at a steady pace, capturing video footage at approximately 30 frames per second (fps).

The annotations for various tasks is available in their respective folder and described in details below:

  1. 2d box annotations : This folder contains the object label, x-y coordinates, width and height of the various objects within the image in the .txt file. Here, the Yolov5 algorithm is used to perform object detection on a set of input images which uses a deep neural network to detect objects within an image. The files of this folder are in the same sequence as that of the original RGB image with the same file names. The use of 2D bounding box annotations has several benefits in computer vision, including object detection. 2D bounding boxes can be used to train object detection models, which are designed to identify the location and extent of objects within an image.

  2. Motion instance annotations : This folder contains the annotations for the moving objects which have relative motion with respect to the motion of the vehicle. Those objects create bias in specific tasks like depth estimation and have to be masked before feeding into the network. The files in the folder are in the same sequence as the original image and can be masked pixel wise.

  3. Calibration : The process of turning the 3D world into a 2D image is done by a camera and this process is said to be calibration which can be done using the camera parameters namely the intrinsic and extrinsic parameters. As of now only a single json file is dumped with the parametric values of the camera and users can make separate copies of the file for calibration tasks related to each original image. Together, the intrinsic and extrinsic parameters are used to model the relationship between the 3D world and the 2D image captured by the camera. This information is essential in many computer vision applications, such as 3D reconstruction, object tracking, and augmented reality.

  4. Ground Truth : This is a subfolder inside the "Motion instance annotations" folder that contains the correct labels or annotations for each image or object in a dataset. These labels are in the form of images and the filenames are the same as those of the original images. Ground truth labels are used during evaluation to assess the accuracy and performance of the machine learning or deep learning models. The accuracy of a model is measured by comparing its predicted labels to the ground truth labels, using evaluation metrics.

  5. RGB images: This folder contains the RGB images extracted from the video recording (30 fps) by considering one frame  per second with 30 frame interval. A total of  10,000 RGB images are there in the folder indexed sequentially as  appeared in the video recording. 

  6. Previous Image (not provided) :  The folder for the previous images corresponding to each original image may be prepared as per the requirement using the original images present in the "RGB image" folder.  The use of previous images is especially relevant in computer vision tasks that involve motion prediction or scene understanding, as it enables algorithms to better understand the dynamics of the scene and predict future states.
     

The original images can be converted to 3d tensors with RGB channels or grayscale channels. The 2d box annotated values can be fed to the network for better understanding of the objects, the RGB image tensor values can be matrix multiplied with intrinsic parameters for converting from 3d to 2d and vice-versa, motion images can be multiplied with their respective original image so the moving objects is not taken into consideration, evaluation for the model with tensors from the ground truth can be done and at last previous image tensors can also be fetched along with the current image for upcoming predicted tasks.

 

Files

box_2d annotations.zip

Files (4.3 GB)

Name Size Download all
md5:532998efe7fc8506729e9b49289e1b78
2.0 MB Preview Download
md5:e9a146e1b97b335e2b766b20795498d1
356 Bytes Preview Download
md5:30076ab4378b29bdbdba477692573c56
48.8 MB Preview Download
md5:b842dfe396f0725bf7151326ab794a48
4.3 GB Preview Download