Discovering Indirect Associations in Crash Data Through Probe Attributes
Presented at the Annual Meeting of the Transportation Research Board, January 2008, Washington, D.C., and accepted for publication in the Transportation Research Record: Journal of the Transportation Research Board. Copyright, National Academy of Sciences. Abstract posted with permission of TRB. For complete paper, please link to http://pubsindex.trb.org
NOTE: At the time of publication, the author Anurag Pande was not yet affiliated with Cal Poly.
Association analysis is a popular data-mining technique for detecting dependencies in transaction databases. The algorithms for discovery of association rules attempt to find the products that tend to sell together. For this purpose, one can treat crashes as individual transactions and crash characteristics as the "products" part of that transaction to discover patterns of the crash characteristics that tend to coexist. The a priori algorithm to search for association rules employs the lower bound on intersection (i.e., joint) frequency of two crash characteristics to ensure that only significant patterns are discovered. Hence, remarkable associations relating crash characteristics that have joint frequency lower than the specified threshold cannot be identified. The present study proposes to apply probe attributes and indirect associations to overcome this problem for crash data. Two sets of items, X and Y, are indirectly related if there exists another set P (i.e., probe attribute) such that, while sets X and Y rarely or never occur simultaneously, they are highly dependent on the common probe attribute P. Indirect associations may be particularly useful in the context of arterial crash patterns because several products are part of the same original nominal variable and have no intersection (i.e., zero joint frequency) in the market basket database. For example, the products rain and cloudy created from the same original variable weather have zero joint frequency. The probe attributes-based analysis revealed some remarkable arterial crash patterns for the SR-50 corridor (in central Florida), including an association between morning and afternoon peak-period crashes through the site location driveway access. It was also found that young drivers are prone to commit errors on arterial sections that are not level, straight, or both.
Anurag Pande and Mohamed Abdel-Aty. "Discovering Indirect Associations in Crash Data Through Probe Attributes" 2008
Available at: http://works.bepress.com/apande/25