The GRIFFIN Perception Dataset: Bridging the Gap Between Flapping-Wing Flight and Robotic Perception

The development of automatic perception systems and techniques for bio-inspired flapping-wing robots is severely hampered by the high technical complexity of these platforms and the installation of onboard sensors and electronics. Besides, flapping-wing robot perception suffers from high vibration levels and abrupt movements during flight, which cause motion blur and strong changes in lighting conditions. This paper presents a perception dataset for bird-scale flapping-wing robots as a tool to help alleviate the aforementioned problems. The presented data include measurements from onboard sensors widely used in aerial robotics and suitable to deal with the perception challenges of flapping-wing robots, such as an event camera, a conventional camera, and two Inertial Measurement Units (IMUs), as well as ground truth measurements from a laser tracker or a motion capture system. A total of 21 datasets of different types of flights were collected in three different scenarios (one indoor and two outdoor). To the best of the authors' knowledge this is the first dataset for flapping-wing robot perception.


SUPPLEMENTARY MATERIAL
The dataset is available at: http:\grvc.us.es/eye-bird-dataset

I. INTRODUCTION
In the last years, there has been a growing interest in the development of bio-inspired aerial robots. The potential advantages of these platforms, such as flapping-wing robots, over traditional rotary-wing and fixed-wing platforms in terms of energy consumption and safety in populated environments have motivated significant R&D efforts that have resulted in the development of bird-scale platforms (i.e. ornithopters), such as [1], [2], or [3], as well as insect-scale platforms [4]. However, very few of the reported flapping-wing platforms include onboard perception capabilities. We are interested in perception for bird-scale flapping-wing robots, also called ornithopters, which have enough payload capability for onboard sensors and embedded computers. This work has been developed within the GRIFFIN ERC Advanced Grant, which Manuscript received October 15, 2020; Revised November 30, 2020; Accepted January, 21, 2021. This paper was recommended for publication by Cesar Cadena upon evaluation of the Associate Editor and Reviewers' comments.
The authors are with the GRVC Robotics Lab Sevilla. Universidad de Sevilla, Spain email: {jrodriguezg, raultapia, jlpaneque, pgrau, ageguiluz, jdedios, aollero}@us.es.  is devoted to the development of ornithopter robots capable of navigating, maneuvering, perching, and manipulating objects in an autonomous manner. Flapping-wing flight suffers from mechanical vibrations, and wide abrupt movements caused by the lift and thrust generated during downward strokes, which cause limitations in existing perception techniques [5]. Research in flapping-wing robot perception is constrained by the high complexity of these platforms. Besides, although there are a number of datasets for aerial robot perception, none of them provide sensor data collected during flapping-wing flights.
This paper presents a perception dataset for ornithopter robots. It contains measurements gathered from onboard lightweight sensors commonly used in aerial robots and suitable to address the main challenges of perception on board flapping-wing robots: an event camera, a conventional camera, and IMUs. Event cameras have high temporal resolution and dynamic range, which make them very robust against motion blur and lighting conditions. Traditional cameras and IMUs are two of the most widely used sensors in aerial robotics. Besides, the datasets include ground truth measurements from a laser tracker (in outdoor scenarios, see Figure 1) or a motion capture system (in indoors), and ArUco visual markers scattered in the scenario. Three different types of datasets are included: 1) Base datasets, with agile maneuvers that exploit the ornithopter flying capability; 2) ArUco datasets, with ArUco markers in the scenario mapped with ground truth positioning; and 3) People datasets, which provide samples for developing and evaluating object detection techniques based on events and/or visual images. The provided datasets can be useful to develop, tune or evaluate a wide range of image and event-based perception techniques from feature extraction and odometry  to object detection, semantic segmentation or SLAM, among others. The rest of the paper is organized as follows. Section II summarizes the main works in the topics addressed in the paper. The flapping-wing platform and sensors used to collect the dataset are presented in Section III. The sensor data format is given in Section IV. Section V describes the dataset collection and evaluation. Finally, Section VI concludes the paper and highlights the main future research steps.

II. RELATED WORK
In recent years, a wide variety of drone datasets have been introduced for robotic perception tasks, see e.g., [6]- [12]. The work in [6] presents Mid-Air, a large synthetic dataset of quadcopters flying in unstructured environments that includes data from different scenarios and climate conditions using the Airsim simulator. The work in [8] provides a dataset including real and photo-realistic synthetic data to evaluate place recognition methods with respect to viewpoint tolerance. The AU-AIR dataset [7] presents annotated data of traffic surveillance onboard an Unmanned Aerial Vehicle (UAV) in addition to data from camera, GPS, and IMU. A summary of the main reported datasets for aerial robot perception is shown in Table I.
Some of the released datasets focus specifically on Micro Aerial Vehicles (MAVs) and multirotor drone racing motivated by the challenging perception issues posed by aggressive flights. The EuRoC dataset [9] provides data of MAV flights in two different indoor scenarios: an industrial scenario for evaluating visual-inertial localization, and a room equipped with a motion capture system for evaluating 3D reconstruction techniques. The MAV dataset in [10] includes flights within the urban streets of Zurich, Switzerland. The Blackbird UAV dataset [11] approaches the problem of agile and autonomous operation of aerial vehicles in outdoor environments with special emphasis on visual inertial navigation, 3D reconstruction, and depth estimation. The data from real flights has been extended by generating additional synthetic data through simulation. A dataset of fast flights using a quadrotor in an outdoor scenario is presented in [12]. Four different flights were performed in an airport runway repeating the same trajectory at four different speeds in the range [5,17.5] m/s. All the above datasets include high-resolution images, GPS, and IMU data. Some of them also provide additional measurements, such as stereo vision data (the EuRoC dataset) or rotor tachometers data, and depth images (the Blackbird UAV dataset). Although the above datasets provide measurements relevant for perception techniques for multirotors and MAVs, they are not adequate for testing and developing techniques that deal with the perception difficulties raised during flapping.
Flapping-wing robots entail specific perception challenges and require perception systems and techniques that consider the effects of generating lift and thrust through wing flapping. One of the first approaches to cope with the challenges of ornithopter perception has been presented in [16]. The authors propose a vision-based stabilizing system to address the pitch and roll fluctuations during each flapping period. The system could be carried within the <100 g payload limitation of their ornithopter. Recently, some datasets useful for flappingwing robot development have been published. In a previous work we presented ROSS-LAN [15], a simulation scheme to obtain synthetic sensor data from trajectories that mimic bird flying maneuvers, and released some synthetic datasets. Also, the work in [17] includes experimental control data of an ornithopter performing landing manoeuvres in a scenario equipped with a motion capture system, but does not provide onboard sensing data that could be used in perception. To the best of our knowledge, no dataset with experimental onboard measurements suitable for flapping-wing robot perception has been reported.
The strict payload capacity and weight distribution constraints together with the vibration level and abrupt movements of flapping-wing platforms require a careful selection of the sensors mounted onboard an ornithopter. In work [5] the suitability of LIDARs, conventional, and event cameras for ornithopters was analyzed, concluding that event-based vision provides a promising solution to many of these perception challenges. Event cameras capture the illumination changes in the form of events with microsecond time resolution and high dynamic range. Unlike conventional cameras, event cameras are robust to motion blur and lighting conditions. They have moderated weight, size, and low power consumption. The use of event-based vision onboard UAVs has received increasing research interest [18] in problems such as visual servoing [19], motion segmentation [20], surveillance tasks [21], robot localization [22], and onboard computation load management [23], among many others.
A number of datasets for event-based vision have been presented exploring the advantages of event cameras onboard aerial robots [13], [14], [21], [24]. The works in [21] and [24] provide sequences recorded onboard multirotors used to evaluate event-based methods for tracking moving objects. The Multivehicle Stereo Event Camera Dataset, presented in [13], also includes measurements from LIDAR and several IMUs. Among the vehicles used, an hexacopter provides sensor measurements under different perspectives, vehicle velocities, and illumination conditions. The work in [14] presents a dataset for autonomous drone racing including measurements from event cameras, conventional stereo-pair cameras, IMU, and the ground truth pose. However, none of these works have explored the use of event-based vision onboard flapping-wing robots.

III. THE EXPERIMENTAL PLATFORM
This section presents the bird-scale flapping-wing robotic platform, the onboard sensors and the ground truth instruments used in the presented datasets.

A. The GRIFFIN Eye-Bird Ornithopter
The ornithopter used in this work is an evolution of E-Flap, a custom design developed by the GRVC Robotics Lab in the GRIFFIN project, which was modified with onboard sensors and electronics for perception research.
Flapping-wing aircrafts have strict restrictions in payload capacity and weight distribution. Adding payload to an ornithopter would make it demand more lift and thrust during flight, requiring higher flapping frequencies that induce higher stresses over the structural and mechanical parts. With a 1.5 m wingspan and an empty weight of 450 g, the design of Eye-Bird was optimized for maximum payload capacity and ease of payload integration, with multiple attachment points though simple nuts, bolts, or cable ties over the body. A total of 250 g was added as payload (sensors, electronics, batteries, and cables) for the presented flights. Being more than half the empty weight of the ornithopter, its location strongly affects the center of gravity (CoG) and inertias, and hence, stability and maneuverability. The ornithopter total length is 95 cm. The 39 cm length tail brings the neutral point of the aircraft back to 25 cm from the head, which dictates the limit at which the CoG can be pulled back to keep the ornithopter stable. Figure 2 shows the ornithopter design and parts allocation. Placing sensors and batteries at the head brings forward the CoG for better stability. Electronics were placed at the safest point, between the body tube and the wings. Although the exposed nut heads, bolts, and wires increase the total wetted surface of the aircraft and increases drag, this was preferred instead of covering the whole body with a fuselage, which would hamper electronics integration and would reduce versatility.
Aerodynamic surfaces (wings and tail) are made of an ultra lightweight ripstop nylon fabric adhered to an optimized carbon fiber structure made of tubes and rods. This carbon fiber structure was designed as a tradeoff between the aerodynamic, elastic, and inertial properties of the wing. As a result, the wing is shaped for optimal low speed flapping flight aerodynamics, and capable of gliding with a low descent rate, which makes this ornithopter suitable for safe landing even in emergency cases, as opposed to multirotors. Wing fabric and structure achieve 82 g for a total surface of 0.44 m 2 . Special emphasis was put into weight optimization over the wings. Lighter wings imply lower inertia forces during flapping, which become relevant at high speed flapping. Reducing wing inertia also reduces mechanical stresses over the flapping mechanism, and reduces the power demanded by the driving motor. Lower weight also gives more room available for additional payload.
The tail consists of a triangular horizontal stabilizer of 0.1 m 2 plus a triangular rudder of half the size, providing longitudinal and directional stability and attitude control. Both surfaces are actuated with two concatenated servos, the first acting over the tail pitch, and the second, over the direction of the rudder. The tail pitch is used to control the aircraft nose pitch, and hence, the forward speed. The lack of roll actuations is mitigated by the roll stability that provides wing dihedral, and the directional stability and control provided by the rudder. The tail was trimmed for each weight distribution taking maximum glide ratio as criteria.

B. Sensors
The sensors on board GRIFFIN Eye-Bird were selected considering their interest for addressing the flapping-wing robot perception challenges and the platform strict weight and size constraints. Data acquisition and collection was done with low-weight Khadas VIM3 board that equips a 6-core ARM CPU, an USB-3.0 interface, and a 16GB eMMC storage unit. The Khadas mounts Ubuntu 18.04 with ROS Melodic, and stores the collected data in the ROS bag format.
A significant effort was devoted to reduce the weight of the components on board the ornithopter. The main characteristics of onboard sensors are shown in Table II  ornithopter. The total weight of all sensors, electronics, and batteries, including the required cables, was lower than 250 g, near to the maximum payload capacity for the ornithopter with the desired maneuverability.
The main sensors mounted on GRIFFIN Eye-Bird are an ini-Vation DAVIS346 and a VectorNav VN-200. The DAVIS346 embeds three different sensors: (1) a 346x260 dynamic vision sensor (DVS) that outputs timestamped and polarized events at a maximum rate of 12 MHz and with a temporal resolution of 1 µs; (2) a 346x260 active pixel sensor (APS) that is coincident with the DVS and outputs grayscale images at 40 Hz; and (3) a MPU 9250 IMU that delivers measurements at 1 kHz. To reduce its weight and fulfill the GRIFFIN Eye-Bird strict payload and weight distribution requirements, the lens and metallic case of the DAVIS346 have been removed and substituted by lighter components. Two different lenses, one for outdoors and one for indoors were used, the latter with an IR cut-off filter to cope with the OptiTrack IR emitters. Each had a weight of 5 g, a focal distance of 3.6 mm and a Field of View of 83 • horizontal and 68 • vertical. The case was replaced with a PLA (Polylactic Acid) 3D printed encapsulation. The total weight of the adapted DAVIS346 was 52 g, less than a third of its original weight, which was ∼ 170 g (original lens included). The DAVIS346 was mounted at the head of the ornithopter with a pitch angle of 30 • , in a protective lightweight soft case -made by 3D printing using flexible TPU95 filament-that acts as a shock absorber in case of a frontal collision. To facilitate installation in the ornithopter head, the DAVIS346 was mounted upside down as seen in Figure 3, which shows the reference frames of each sensor. The VectorNav VN-200 includes a high-end IMU. It provides several times smaller noise density and in-run bias stability than the IMU embedded in the DAVIS346. It was installed as close as possible to the ornithopter center of gravity. The measurements from the VN-200 are accessed trough UART and published in ROS at 80 Hz using a custom package 1 . The VectorNav can also mount a GPS, but was not installed due to the flight and maneuverability weight constraints. Instead, the position ground truth was obtained with a motion capture system (indoor scenarios) and a Leica Nova MS50 TotalStation laser tracker (outdoor scenarios), which provide significantly lower localization errors than GPS.
The MS50 TotalStation is able to follow a moving reflective target and provide precise range and bearing measurements 1 https://github.com/grvcPerception/vn ros integration of it, which can be translated to 3D cartesian coordinates. The mounted target is a GRZ101 360 • MiniPrism, which was installed at the head of the ornithopter. To prevent occlusions, the TotalStation was placed at an obstacle-free location and as far as possible from the flight area to increase the TotalStation Field of View. Further, to prevent occlusions caused by the ornithopter body, the flights were arranged such that the prism faced the TotalStation during the trajectory. The strong ornithopter motions while flapping could occasionally cause temporal tracking failures that only affected to a fraction of the recorded data, enabling a reliable ground truth position reconstruction in outdoor flights. For indoor flights, an OptiTrack motion capture system with 28 cameras was used to provide millimeter accuracy ground truth position and orientation at a rate of 100 Hz. The ornithopter was tracked using five infrared (850 nm) LEDs installed at its main frame. The OptiTrack designated body frame was located at the position of the TotalStation prism.

IV. DATA FORMAT AND CALIBRATION
All sensor measurements were recorded as ROS bag files using timestamped messages. The format of each sensor measurement is defined by the standard ROS message library. Besides, the events from the DAVIS346 DVS use the format defined in [25], in which each event e is represented as e = (t s , u, v, p), where t s is the time the event was triggered, (u, v) are the event pixel coordinates, and p is the event polarity either positive or negative. The images from the DAVIS346 APS include the pixel raw data, image resolution, and encoding. Further, the measurements from the VectorNav and DAVIS346 IMUs include referenced orientation, angular velocity, and linear acceleration. The measurements from the TotalStation and the Optitrack are formatted as timestamped poses (without orientation in the case of the TotalStation). Each measurement is referenced w.r.t. its sensor frame, see Figure 3. The world reference frame is taken as the frames of the TotalStation and the OptiTrack.
A calibration dataset was obtained before each set of flights. The raw dataset is provided so users can calibrate with their tool of preference. Each dataset includes images from the DAVIS346 APS, events from the DAVIS346 DVS, and measurements from the DAVIS346 IMU and the VectorNav. Additionally, a calibration file obtained with the Kalibr toolbox [26] is provided for each calibration dataset. The camera intrinsic calibration is recomputed at each calibration experiment. The extrinsic camera-IMU calibration of both IMUs is fixed as the most consistent calibration obtained by Kalibr through all the datasets, since it is less likely to change between flights and more affected by noise in the calibration data. Also, a 2-hours dataset of the IMUs measurements with the bird still is provided so users can obtain IMUs characterizations, e.g., using the Allan variance method 2 . The calibrations were further validated by using them to execute a VIO (Visual-Inertial Odometry) algorithm in all the conducted flights, see Section V.

V. THE GRIFFIN DATASETS
First, to better understand the effect of wing flapping on the onboard sensors, we performed a set of flights where the trajectories executed by the ornithopter were imitated by a multirotor UAV. The flights were repeated until the multi-rotor trajectories were similar to those performed by the ornithopter. The multirotor platform was a DJI Flame-Wheel F450 frame with a PixRacer autopilot equipped with a DAVIS346 (mounted with a 30 o pitch rotation) and a Khadas VIM3 board for data logging using ROS, see Figure 4.
As expected, the experimental results show that the ornithopter suffers from significantly stronger vibrations than the multirotor. On average the DAVIS346 IMU registered 3 times greater accelerations in Eye-Bird than in the multirotor. These vibrations had direct impact on event generation: 6 times more events were triggered in Eye-Bird than in the multirotor. Figure 5-a,b shows the event images accumulated at 40 Hz gathered in both platforms when pointing to a similar part of the scenario. The ornithopter flight generated a significantly greater number of events than the multirotor. Although this effect depends on the scenario and type of flight, these differences were consistently observed in all the flights performed. As an example, Figure 5 both platforms described a similar trajectory, confirming the intuition. Additionally, the abrupt changes in the ornithopter pitch angle during flapping often resulted in underexposed and overexposed visual images in outdoor scenarios, which imposes additional lighting robustness requirements to visionbased techniques suitable for ornithopters. The dataset collection consisted on recording data from onboard and external sensors during the ornithopter flight. The measurements from the onboard sensors were recorded in the Khadas VIM3, while those from the TotalStation and OptiTrack were recorded in an external computer. The clocks in the Khadas VIM3 and in the external computer were synchronized using Network Time Protocol (NTP) [27].
Data recording (of onboard and external sensors) was initialized before each flight. Just before being launched, the ornithopter was oriented such that the DAVIS346 pointed towards a referenced calibration grid in order to add visual reference and features. Next, the ornithopter was smoothly launched towards the flight area. Once in the air, the ornithopter was guided towards the landing area while performing different types of trajectories. Figure 6-top shows the number of events generated per millisecond during one of the dataset flights (Hills Base 3). The three flight stageslaunching, flapping, and landing-can be inferred from the graph, since the event generation is affected by different dynamics. Figure 6-bottom shows r , the module in world coordinates (W) of the position vector of the TotalStation prism throughout that flight. Temporal synchronization is noticed by comparing both graphs (e.g., the maximum number of events per millisecond matches the ornithopter landing). The ornithopter flights had different durations depending on the maneuvers performed and the type of dataset. Three types of datasets were recorded: • In Base datasets the ornithopter described agile trajectories until it reached a safety altitude for landing. There were no artificial landmarks in the scenario and the maneuver varied depending on the wind conditions. • In ArUco datasets the ornithopter described smooth trajectories trying to flight over ArUco markers placed on the ground used to add ground truth references and features in the scene. • In People datasets the ornithopter flies over people and objects with the aim of providing samples for object detectors. To cover larger areas, the ornithopter described longer trajectories without altitude limitation, and hence, these datasets were recorded without measurements from the TotalStation.
The datasets were recorded in two outdoor and one indoor environments, see Figure 7. The Soccer outdoor scenario is a small soccer field surrounded by obstacles with different heights such as threes, benches, and fences. Its total area is 48 × 54 m. The ornithopter was launched from a small bridge structure in front the yard, while the TotalStation was located at an elevated spot located at ∼ 74 m from the launching point. The Hills outdoor scenario is a large obstacle-free irregular area with hills and slopes. It is a large scenario (170 × 100 m), which enables longer flights without obstacles or constrained flight space. The distance between the TotalStation and the launching spot was ∼ 135 m. The Testbed indoor scenario was a 15 × 21 × 8 m room designed for testing ornithopters and equipped with an OptiTrack motion capture system. In the simplest indoor trajectories Eye-Bird was controlled autonomously with a simplified method inspired in [17], which used the OptiTrack measurements. These datasets can be of interest for developing onboard perception techniques for closing the lowlevel control loop. In the outdoor and the most complex indoor flights Eye-Bird was controlled manually by an expert pilot to better compensate for the wind disturbances and perform richer trajectories that exploit the ornithopter maneuverability and also evidence the challenges of flapping for perception. Table III lists the 21 provided datasets, 9 of them conducted in the Testbed scenario, 7 in the Soccer scenario, and 5 in Hills. A total of 10 ArUco and 9 Base datasets are provided due to their interest for perception techniques in partially structured (ArUco) and fully unstructured (Base) environments. People datasets were conducted in the Soccer scenario. In these datasets annotated bounding boxes are provided to facilitate training for people detection algorithms. For safety, we did not record People datasets in the Testbed. Also, in the Hills scenario the robot flew at high altitudes, which hampered the correct identification of objects.   [28]; and availability of annotated data. w is not provided in Hills and Soccer scenarios as TotalStation provides only position ground truth.
flight. The maximum event throughput of DAVIS346 is 12 Million Events Per Second, i.e. 12,000 events per millisecond. Hence, the number of events generated per millisecond in all flights were lower than the DAVIS346 throughput, i.e. no event was discarded in any dataset. For instance, Figure 6-top shows the events triggered during the Hills Base 3 flight. The number of generated events is always lower than the maximum event throughput of DAVIS346 even when the ornithopter impacts on the ground. In all conducted flights the maximum event rate takes place when the ornithopter impacts on the ground, and this value is at least 5 times bigger than the mean event rate during the rest of the flight. To complement the dataset, a VIO algorithm was evaluated in all the flights and compared to the provided ground truth. This is valuable for confirming the usability and validity of the provided dataset, assessing the quality of the given calibration and ground truth, and providing a baseline result for comparison with other VIO methods. The chosen method was ROVIO [28], a robust and well-known method capable of providing estimates of the robot trajectory without intensive parameter tuning. Figure 8 shows the ornithopter trajectory estimated with ROVIO versus the ground truth in one flight in each scenario. The absolute translation RMSE (root mean square error) of the VIO trajectory at each flight w.r.t. the position ground truth is provided in Table III. The execution of ROVIO was cut before the landing, where no visual features are present. The absolute error provided by ROVIO in dataset Hills ArUco2 is caused by drift and confirms the limitation of existing techniques with flapping-wing flights.

VI. CONCLUSIONS AND FUTURE WORK
The development of perception techniques for flapping-wing robots faces a number of issues. First, the high mechanical vibrations and sudden motion of these platforms originates motion blur and drastic changes in lighting conditions. Be-sides, the lack of available ornithopters with suitable payload capacity and the difficulties in the development of these platforms set an additional entry barrier.
This paper presented a perception dataset for bird-scale flapping-wing robots, i.e. ornithopters, recorded in two outdoor and one indoor scenarios. It includes measurements from an event camera, a conventional camera, and two IMUs, as well as ground truth data from laser tracker (in outdoor scenarios) and motion capture system (in indoor scenarios). The provided data include: datasets with agile trajectories that exhibit the ornithopter maneuverability; datasets with smooth trajectories and ArUco markers in the scenario; and datasets for developing and evaluating event-based and visual-based object detection techniques.
The development of perception techniques based on the aforementioned sensors for online execution on board the ornithopter is object of current research. The motion blur and sudden lighting changes observed in the presented datasets recommend the use of event cameras. The vast amount of information captured by the event camera during the ornithopter flight together with the strong payload limitations of these platforms require a significant effort for developing eventbased vision methods that can be online and onboard executed. This work intends to set the baseline for future research and contribute to pave the way to develop perception systems that endow the necessary capabilities for ornithopter robots to perceive and interact with the environment. IEEE