< Back

Big Data : Stanford Policing Project

Project Description

This project examined racial disparities in traffic stops using large-scale data from the Stanford Open Policing Project. Millions of records were processed in Databricks using Apache Spark to explore stop outcomes, search rates, and arrest probabilities across racial groups. The analysis aimed to uncover potential systemic bias in law enforcement practices across multiple U.S. states.

Skills Showcased:

Distributed data processing using Apache Spark
Data transformation & aggregation with Spark SQL
Scalable ETL development and performance tuning
Time-series analysis & visualization with Python and Databricks notebooks
Data handling and visualization using PySpark, Pandas, and Matplotlib

Data Source

Stanford Open Policing Project: https://openpolicing.stanford.edu/

Youtube Presentation Link: https://www.youtube.com/watch?v=Mco2p-wFxJA

Key Insights:

Black and Hispanic drivers were searched more often than White drivers, yet had lower contraband discovery rates.
White drivers were more likely to receive warnings, while Black drivers faced higher rates of arrests and citations.
The “veil of darkness” hypothesis was supported — racial disparities in search rates narrowed after sunset.
Disparities were consistent across several high-volume states including California, Texas, and Florida.

My Portfolio

"The goal is to turn data into information, and information into insight."
— Carly Fiorina

Big Data : Stanford Policing Project

Project Description

Project Gallery