L3Pilot Common Data Format

The European research project L3Pilot has a focus on large-scale piloting of SAE Level 3 functions. Its 34 partners are currently running tests and collecting data in ten countries. All log data from test vehicles is converted into a Common Data Format (CDF), a file format that the project created and now promotes for open collaboration. The format has recently been made available at https://github.com/l3pilot/l3pilot-cdf.

The L3Pilot-CDF defines a detailed structure for storing sensor data one trip at a time. From data collection perspective, there are thousands of available signals from various devices in an AD vehicle. The common signal list in L3Pilot is therefore drastically shortened and only signals needed for the later analysis are included, an approach that follows the FESTA evaluation methodology. The signal list caters for various impact assessment areas, mainly: driver behaviour, user experience, mobility, safety, efficiency, environment, and socio-economics. Still, it’s a prioritized selection and offers a manageable file size.

The L3Pilot-CDF builds on HDF5, a hierarchical data format developed by the HDF Group. Using HDF5 as a basis ensures that a variety of data tools and programming languages can readily interface the files. The interoperability is the main advantage of HDF5: the joint computation framework is built on MATLAB, but many vehicle owners’ data conversion scripts and advanced image processing frameworks use e.g. Python. HDF5 also offers efficient compression and metadata features.

The L3Pilot-CDF has already enabled joint software development of analysis scripts in L3Pilot, including indicator calculation, event and driving situation detection, and support for video annotation. The format and the harmonized indicators will be used to evaluate 13 vehicle owner datasets.

The L3Pilot-CDF uses three main components of HDF5 file format: datasets, groups and metadata. There are three groups on the top level: root (“/”), externalData and annotations. Under root, there are six datasets: egoVehicle, laneLines, objects, positioning, derivedMeasures, performaceIndicators, all described in Table 1.

Table 1 L3Pilot-CDF dataset descriptions

In L3Pilot, the data collected from vehicles is stored in egoVehicle, laneLines, objects and positioning. The derivedMeasures and performanceIndicators are computed in the L3Pilot processing toolchain. Any manual (or automatically calculated) video annotations are added after the derivedMeasures has been computed. See the iterative approach to this in the schema in Image 1 below:

Image 1 L3Pilot-CDF iterative data creation and annotation process

The L3Pilot-CDF is open for collaboration. By opening up the format the project members’ ambitions is to allow further refinement of the format as well as building new applications taking advantage of it. The format is to manage heterogeneous data sources in automated driving test and evaluation projects, and support data sharing. There are already examples of portability for different tools and environments: Windows and Linux, and using Python, R and MATLAB.


https://github.com/l3pilot/l3pilot-cdf, accessed on 2020-03-30.

https://www.hdfgroup.org/solutions/hdf5/, accessed on 2020-03-30. The L3Pilot Common Data Format – enabling efficient automated driving data analysis, ESV conference 2020, Paper Number 19-0043, Hiller, J., Svanberg, E., Koskinen, S., Bellotti, F., Osman, N., 2020. Available on http://indexsmart.mirasmart.com/26esv/PDFfiles/26ESV-000043.pdf, accessed 2020-03-30.

Feedback form

Have feedback on this section? Let us know!

Send feedback


Please add your feedback in the field below.

Your feedback has been sent!
Thank you for your input.

An error occured...
Please try again later.