DATA 146: Introduction to Data Science

Extra Credit Report - due May 18th by 5PM

Analyze demographic data that describes a larger West African country

For this extra credit assignment, you are asked to analyze a household survey that includes several demographic variables that describe a West African country. The dataset is named more_people.csv and has been pinned to the slack channel #data146_extracredit. A household survey is a random, clustered and stratified sample from the larger population. In order to successfully obtain the extra credit, you will need to complete the following steps.

  • Using the variable wealthC as your target, apply the following models.
    • linear regression, ridge, lasso
    • kNN, logistic regression, random forest
    • stretch goal: neural networks or another model
  • Change the target to the variable wealthI and apply the same models as above.
  • Assess the output from each of your models. Produce plots that demonstrate the predictive power of each model. Which model performed the best? Justify and support your results with plots and other metrics that you produced.

For your extra-credit deliverable, write a 2 to 3 page report that describes your investigation into the demographic composition of this West African country. Introduce your report by describing the data itself in terms of its size and shape. Produce plots of the data in order to further describe it. Describe each implementation as part of the report on your investigation into the data. Publish your extra credit report on GitHub as a webpage and share your link to the slack channel above. Please feel free to ask questions on the slack channel and I will be happy to provide further direction or advice as needed. Your extra credit report is due May 18th.