Skip to main content
Article
Using Hadoop and Cassandra for Taxi Data Analytics: A Feasibility Study
Technical Paper Series
  • Alvin Jun Yong KOH, Singapore Management University
  • Xuan Khoa NGUYEN, Singapore Management University
  • C. Jason WOODARD, Singapore Management University
Publication Type
Working Paper
Publication Date
6-2010
Abstract

This paper reports on a preliminary study to assess the feasibility of using the Open Cirrus Cloud Computing Research testbed to provide offline and online analytical support for taxi fleet operations. In the study, we benchmarked the performance gains from distributing the offline analysis of GPS location traces over multiple virtual machines using the Apache Hadoop implementation of the MapReduce paradigm. We also explored the use of the Apache Cassandra distributed database system for online retrieval of vehicle trace data. While configuring the testbed infrastructure was straightforward, we encountered severe I/O bottlenecks in running the benchmarks due to the lack of local disk storage on the compute nodes. This design limitation severely impedes the analysis of large data sets using cloud computing technologies.

Keywords
  • taxi fleet management,
  • GPS data
Publisher
Singapore Management University School of Information Systems
City or Country
Singapore
Citation Information
Alvin Jun Yong KOH, Xuan Khoa NGUYEN and C. Jason WOODARD. "Using Hadoop and Cassandra for Taxi Data Analytics: A Feasibility Study" Technical Paper Series (2010)
Available at: http://works.bepress.com/cjwoodard/2/