Data Science at Scale - Capstone Project

University of Washington via Coursera

Go to Course: https://www.coursera.org/learn/datasci-capstone

Introduction

### Course Review: Data Science at Scale - Capstone Project on Coursera #### Overview The "Data Science at Scale - Capstone Project" course on Coursera offers a unique opportunity for students to engage in a real-world data science project that synthesizes all elements of the data science pipeline. Coupled with a partnership with Coursolve, this capstone allows participants to work on projects that have tangible implications for real stakeholders. It's an exciting chance to apply theoretical knowledge in a practical setting and witness the impact of your work firsthand. #### Syllabus Breakdown The course is structured around a single, impactful project titled **"Blight Fight."** Throughout the six-week program, students are tasked with building a predictive model to determine the likelihood of building condemnation—a serious issue in urban development. **Week 1: Project Introduction** Students are introduced to the Blight Fight project, which emphasizes the real-world significance of predicting building condemnations. The overarching goal is to not only demonstrate technical skill but also to provide solutions that can aid municipalities and stakeholders. **Week 2: Derive a List of Buildings** In this week, students dive into the intricacies of data preparation. You'll be provided with sets of incidents that carry location information, requiring you to apply assumptions and grouping techniques to identify the specific buildings affected by these incidents. This foundational step is crucial, as it encompasses both data cleaning and organization. **Week 3: Construct a Training Dataset** Here, the focus shifts to constructing a robust training dataset. Students will correlate the identified buildings with ground truth labels derived from permit data. This stage underlines the importance of accurate labeling and its impact on model accuracy. **Week 4: Train and Evaluate a Simple Model** In this step, simplicity reigns as you utilize a basic feature set to train your model. Evaluating the performance of this model sets the stage for the enhancements to come. This week emphasizes understanding model evaluation metrics and the iterative nature of model building. **Week 5: Feature Engineering** Building upon the previous results, students are encouraged to dig deeper into feature engineering. By deriving additional features, you’ll aim to enhance your model's performance, learning the critical importance of this phase in the data science pipeline. **Week 6: Final Report** Concluding the course, students submit a comprehensive report detailing their methodologies, findings, and the potential implications of their work. This final piece not only cements your learning but also serves as a valuable portfolio piece for future career endeavors. #### Pros and Cons **Pros:** - **Real-world application:** This course takes theoretical knowledge and places it in the context of a real-world problem, making the learning experience much more engaging and relevant. - **Structured Learning Path:** The gradual buildup from basic model building to more advanced feature engineering provides a clear learning trajectory. - **Collaboration with Stakeholders:** The involvement of partners interested in the outcomes adds a layer of motivation and relevance to the project. **Cons:** - **Complexity of the Project:** The challenge of navigating a real-world problem can be daunting for beginners. Prior foundational knowledge in data science may be required for a smoother experience. - **Time-Intensive:** Engaging with all steps of the data science pipeline requires a significant time commitment, which may be a barrier for some learners balancing other responsibilities. #### Recommendation I highly recommend the **Data Science at Scale - Capstone Project** course for anyone looking to take their data science skills to the next level. It’s especially valuable for individuals who wish to make a real impact with their work. The robust curriculum, combined with the encouragement of a professional outcome, makes it a standout offering on Coursera. Whether you're a data science student or a professional seeking to refine your skill set, this course is designed to challenge you and deepen your understanding of data science applications. The hands-on nature of the work will not only prepare you for similar challenges in your career but also enhance your portfolio with a substantially impactful project. Embrace the challenge, and you may discover newfound confidence in your data science abilities!

Syllabus

Project A: Blight Fight

In this project, you will build a model to predict when a building is likely to be condemned. The data is real, the problem is real, and the impact is real.

Week 2: Derive a list of buildings

You are given sets of incidents with location information; you need to use some assumptions to group these incidents by location to identify specific buildings.

Week 3: Construct a training dataset

Construct a training set by associating each of your buildings with a ground truth label derived from the permit data.

Week 4: Train and evaluate a simple model

Use a trivial feature set to train and evaluate a simple model

Week 5: Feature Engineering

Derive additional features and retrain to improve the efficacy of your model.

Week 6: Final Report

Enter your final report for grading.

Overview

In the capstone, students will engage on a real world project requiring them to apply skills from the entire data science pipeline: preparing, organizing, and transforming data, constructing a model, and evaluating results. Through a collaboration with Coursolve, each Capstone project is associated with partner stakeholders who have a vested interest in your results and are eager to deploy them in practice. These projects will not be straightforward and the outcome is not prescribed -- you w

Skills

Data Wrangling Statistics Data Analysis Python Programming R Programming

Reviews

An interesting problem to tackle. I really liked that you started with very raw data and needed to work on many cleaning methods. Good practice for real data science.