Abstract
The railways worldwide are increasingly looking to the integration of their data resources coupled with advanced analytics to enhance traffic management, to provide new insights on the health of infrastructure assets, to provide soft linkages to other transport modes, and ultimately to enable them to better serve their customers. As in many industrial sectors, over the past decade the rail industry has been investing heavily in sensing technologies that record every aspect of the operation of the railway network. However, as any data scientist knows, it does not matter how good an algorithm is, if you put rubbish in, you get rubbish out; and as the traditional industry model of working with data only within the system that it was collected by becomes increasingly fragile, the industry is discovering that it knows less than it thought about the data it is gathering. When coupled with legacy data resources of unknown accuracy, such as design diagrams for assets that in many cases are decades old, the rail industry now faces a crisis in which its data may become essentially worthless due to a poor understanding of the quality of its data. This paper reports the findings of the first phase of a three-phase systematic review of literature about how data quality can be managed and evaluated in the rail domain. It begins by discussing why data quality matters in a rail context, before going on to define the quality, introduce and expand the concept of a data quality schema.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2017 IEEE International Conference on Big Data (BIGDATA) |
Publisher | IEEE Xplore |
Pages | 3792-3799 |
ISBN (Electronic) | 9781538627150 |
DOIs | |
Publication status | Published - 15 Jan 2018 |
Event | 2017 IEEE International Conference on Big Data - Westin Copley Plaza Hotel, 10 Huntington Avenue, Boston, MA 02116, Boston, United States Duration: 11 Dec 2017 → 14 Dec 2017 |
Conference
Conference | 2017 IEEE International Conference on Big Data |
---|---|
Abbreviated title | BigData 2017 |
Country/Territory | United States |
City | Boston |
Period | 11/12/17 → 14/12/17 |
Keywords
- Data quality
- Rail
- Quality by design
- Data quality schema