Automatic vehicle detection from aerial imagery is of interest for various applications such as traffic management, parking surveillance, urban planning and emission calculation. Currently, most of the datasets available for vehicle detection algorithms are based on images of the global North. As a result, existing algorithms are adapted to the conditions of the North and are of limited use in the Global South due to differences in, for example, total number and types of vehicles.
We present the XWHEEL dataset, which is based on annotated vehicles in six classes on aerial photographs. The classes are based on the number of wheels, the size and the motorisation of the vehicles. The dataset consists of 73 annotated aerial images of Dar es Salaam city (Tanzania) with 15,973 vehicles and a GSD of 7cm. To evaluate the performance of our model, we divide the 73 images into three separate sets: the training set, which consists of 39 images, the validation set, which consists of 17 images, and the test set, which also consists of 17 images.
In this dataset, we first classify vehicles into three types based on the number of wheels: Bicycles (2w), three-wheeled vehicles (3w) and four-wheeled vehicles (4w). In addition, we consider other attributes such as size (large or small) for 4w and motorisation (motorised or non-motorised) for 2w and 3w. We also consider attributes that reflect the appearance of the vehicles, such as occluded, difficult and uncertain.
We propose to merge the non-motorised 2w (bicycles) with the motorised 2w, as even the annotators were often unsure and the bicycle class is very rare overall. As the dataset was created to calculate emissions from informal transport, the focus was on 3w. However, we found that there were not enough non-motorised 3w in our dataset to train the detector and propose to neglect this class for the time being.
Example from the dataset: The vehicles are annotated as polygons with four points (with the first point being on the left front), where different colors of the polygons indicate different classes.