"The goal is to turn data into information, and information into insight."
— Carly Fiorina


Big Data : Stanford Policing Project
Project Description
This project examined racial disparities in traffic stops using large-scale data from the Stanford Open Policing Project. Millions of records were processed in Databricks using Apache Spark to explore stop outcomes, search rates, and arrest probabilities across racial groups. The analysis aimed to uncover potential systemic bias in law enforcement practices across multiple U.S. states.
Skills Showcased:
Distributed data processing using Apache Spark
Data transformation & aggregation with Spark SQL
Scalable ETL development and performance tuning
Time-series analysis & visualization with Python and Databricks notebooks
Data handling and visualization using PySpark, Pandas, and Matplotlib
Data Source
Stanford Open Policing Project: https://openpolicing.stanford.edu/
Youtube Presentation Link: https://www.youtube.com/watch?v=Mco2p-wFxJA
Key Insights:
Black and Hispanic drivers were searched more often than White drivers, yet had lower contraband discovery rates.
White drivers were more likely to receive warnings, while Black drivers faced higher rates of arrests and citations.
The “veil of darkness” hypothesis was supported — racial disparities in search rates narrowed after sunset.
Disparities were consistent across several high-volume states including California, Texas, and Florida.