Final Project: Smart City Data Hackathon

🎯 Learning Objectives

In this final project, you will work in teams to explore, analyze, and visualize urban data to propose data-driven insights or solutions that could improve the functioning of New York City.

You will combine everything learned in the course — from data import and cleaning to visualization, spatial analysis, and reproducible reporting.

🧩 The challenge

The City of New York wants to better understand patterns in mobility, safety, and urban environment. Your task is to use open datasets (such as NYC Taxi data, 311 complaints, weather, or spatial information) to:

Identify a problem or opportunity for the city, analyze it with data, and propose one or more practical, data-informed solutions.

Your project should tell a clear story supported by data, code, and visualization.

🧑‍🤝‍🧑 Team format

  • Work in teams of 2–3 students.
  • Each team chooses its own research question and data sources.
  • Collaboration and creativity are key — technical perfection is less important than originality, storytelling, and insight.

📦 Suggested data sources

You are encouraged to combine several datasets. Here are examples (but you may choose others):

Core dataset

Additional layers (choose 1–2+)

  • Weather data — NOAA, Open-Meteo API, or other public APIs
  • 311 Service Requests — NYC Open Data Portal
  • Traffic volume, accidents, or speed data
  • Public transportation, bike sharing (CitiBike), or parking data
  • Neighborhood demographics or land use
  • Spatial data — borough boundaries, taxi zones, street networks

🗺️ What your project should include

  1. Data import and integration
    • Combine at least two independent datasets (e.g. taxi + weather, or taxi + 311 complaints).
    • Use advanced import tools (arrow, duckdb, polars, or API access).
  2. Data transformation and exploration
    • Clean and summarize data.
    • Use dplyr, sf, or polars for efficient manipulation.
  3. Spatial or temporal visualization
    • Create at least one map or time series plot.
    • Use ggplot2, sf, or leaflet.
  4. Insights and recommendations
    • Describe findings clearly.
    • Propose 1–2 potential solutions, policies, or improvements for the city.
  5. Reproducibility and storytelling
    • Document your process in a Quarto report.
    • Include code, figures, and narrative in a cohesive story.

🌟 Examples of possible directions

  • 🕒 Mobility efficiency: “How do traffic jams and weather affect trip duration?”
  • 💰 Economic behavior: “Which neighborhoods have the highest average tips and why?”
  • 🧭 Accessibility: “Where are the underserved zones with poor transport connectivity?”
  • 🌳 Sustainability: “How can the city optimize taxi demand to reduce CO₂ emissions?”
  • 🧹 Urban services: “Are 311 complaints correlated with low taxi activity or certain areas?”
  • 🗺️ Safety: “Mapping taxi accident locations and suggesting safer routes.”

📤 Submission

Each team must submit:

  1. A Quarto project (one .qmd or multi-file report) including:
    • Introduction: research question and motivation.
    • Data import and cleaning process.
    • Analysis, visualization, and interpretation.
    • Policy or design recommendations.
  2. A short presentation (5–7 minutes) during the hackathon session:
    • Tell your story visually.
    • Focus on key insights and proposed solutions.
  3. Link to your published Quarto report (on Quarto Pub or GitHub Pages).

🧭 Final note

This project is your opportunity to think like data scientists working for a city — where code meets impact. Be bold, creative, and analytical. Surprise us with something the data hides!