top of page
Black Background
< Back

Big Data : Stanford Policing Project

Project Description

This project examined racial disparities in traffic stops using large-scale data from the Stanford Open Policing Project. Millions of records were processed in Databricks using Apache Spark to explore stop outcomes, search rates, and arrest probabilities across racial groups. The analysis aimed to uncover potential systemic bias in law enforcement practices across multiple U.S. states.


Skills Showcased:

  • Distributed data processing using Apache Spark

  • Data transformation & aggregation with Spark SQL

  • Scalable ETL development and performance tuning

  • Time-series analysis & visualization with Python and Databricks notebooks

  • Data handling and visualization using PySpark, Pandas, and Matplotlib


Data Source

Stanford Open Policing Project: https://openpolicing.stanford.edu/

Youtube Presentation Link: https://www.youtube.com/watch?v=Mco2p-wFxJA


Key Insights:

  • Black and Hispanic drivers were searched more often than White drivers, yet had lower contraband discovery rates.

  • White drivers were more likely to receive warnings, while Black drivers faced higher rates of arrests and citations.

  • The “veil of darkness” hypothesis was supported — racial disparities in search rates narrowed after sunset.

  • Disparities were consistent across several high-volume states including California, Texas, and Florida.


Project Gallery

MS Business Analytics and Information Systems

University of SouthFlorida

  • LinkedIn
bottom of page