Skip to content

annabiancajones/GA_capstone_project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GA_capstone_project

This project is the submission from General Assembly Data Science Immersive Course DSI-4

Project workflow organised in document order: Part 1 : Identify/ Pitch Part 2 : Aquire, Parse Part 3 : Mine, Refine Part 4 : Build Part 5 : Predict Part 6 : Present

Part 1 :Identify IDENTIFY: Understand the problem

  • Identify business/product objectives.
  • Identify and hypothesize goals and criteria for success.
  • Create a set of questions to help you identify the correct data set.

Pitch us on potential ideas for a data-driven project. Think of topics you’re passionate about, knowledge you’re familiar with, or problems relevant to industries you’d like to work with. What questions do you want to answer?

Part 2 :Parse + Aquire ACQUIRE: Obtain the data Ideal Data vs. Available Data Often times we start by identifying the ideal data we would want for a project.

Data for Predictions: Foursquare API Data for modelling: XML file of labeled from meta share

Some typical questions at this stage may include:

  • Identifying the right data set(s)
  • Is there enough data?
  • Does it appropriately align with the question/problem statement?
  • Can the dataset be trusted? How was it collected?
  • Is this dataset aggregated? Can we use the aggregation or do we need to get it pre-aggregation?
  • Assess resources, requirements, assumptions, and constraints

PARSE: Understand the data

  • Common Tasks at this step include:
  • Reading any documentation provided with the data (e.g. data dictionary above)
  • Performing exploratory surface analysis via filtering, sorting, and simple visualizations
  • Describing data structure and the information being collected
  • Exploring variables, data types via select
  • Assessing preliminary outliers, trends
  • Verifying the quality of the data (feedback loop -> 1)

Part 3 Mine + Refine MINE: Prepare, structure, & clean the data Often, our data will need to be cleaned prior performing our analysis. Common Tasks at this step include:

  • Sampling the data, determine sampling methodology
  • Iterating and explore outliers, null values via select
  • Reviewing qualitative vs quantitative data
  • Formatting and cleaning data in Python (e.g. dates, number signs, formatting)
  • Defining how to appropriately address missing values (cleaning)
  • Categorization, manipulation, slicing, format, integrate data
  • Formatting and combining different data points, separate columns, etc.
  • Determining most appropriate aggregations, cleaning methods
  • Creating necessary derived columns from the data (new data)

REFINE: Exploratory Data Analysis & Iteration

Such descriptive statistics allow us to:

  • Identify trends and outliers
  • Decide how to deal with outliers - excluding, filtering, and communication
  • Apply descriptive and inferential statistics
  • Determine initial visualization techniques
  • Document and capture knowledge
  • Choose visualization techniques for different data types
  • Transform data

Part 4 Build BUILD: Create a data model

Some of the steps we will take to build a model include:

  • Selecting the appropriate model
  • Building a model
  • Testing and training our model
  • Evaluating and refining our model

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published