GA_capstone_project

This project is the submission from General Assembly Data Science Immersive Course DSI-4

Project workflow organised in document order: Part 1 : Identify/ Pitch Part 2 : Aquire, Parse Part 3 : Mine, Refine Part 4 : Build Part 5 : Predict Part 6 : Present

Part 1 :Identify IDENTIFY: Understand the problem

Identify business/product objectives.
Identify and hypothesize goals and criteria for success.
Create a set of questions to help you identify the correct data set.

Pitch us on potential ideas for a data-driven project. Think of topics you’re passionate about, knowledge you’re familiar with, or problems relevant to industries you’d like to work with. What questions do you want to answer?

Part 2 :Parse + Aquire ACQUIRE: Obtain the data Ideal Data vs. Available Data Often times we start by identifying the ideal data we would want for a project.

Data for Predictions: Foursquare API Data for modelling: XML file of labeled from meta share

Some typical questions at this stage may include:

Identifying the right data set(s)
Is there enough data?
Does it appropriately align with the question/problem statement?
Can the dataset be trusted? How was it collected?
Is this dataset aggregated? Can we use the aggregation or do we need to get it pre-aggregation?
Assess resources, requirements, assumptions, and constraints

PARSE: Understand the data

Common Tasks at this step include:
Reading any documentation provided with the data (e.g. data dictionary above)
Performing exploratory surface analysis via filtering, sorting, and simple visualizations
Describing data structure and the information being collected
Exploring variables, data types via select
Assessing preliminary outliers, trends
Verifying the quality of the data (feedback loop -> 1)

Part 3 Mine + Refine MINE: Prepare, structure, & clean the data Often, our data will need to be cleaned prior performing our analysis. Common Tasks at this step include:

Sampling the data, determine sampling methodology
Iterating and explore outliers, null values via select
Reviewing qualitative vs quantitative data
Formatting and cleaning data in Python (e.g. dates, number signs, formatting)
Defining how to appropriately address missing values (cleaning)
Categorization, manipulation, slicing, format, integrate data
Formatting and combining different data points, separate columns, etc.
Determining most appropriate aggregations, cleaning methods
Creating necessary derived columns from the data (new data)

REFINE: Exploratory Data Analysis & Iteration

Such descriptive statistics allow us to:

Identify trends and outliers
Decide how to deal with outliers - excluding, filtering, and communication
Apply descriptive and inferential statistics
Determine initial visualization techniques
Document and capture knowledge
Choose visualization techniques for different data types
Transform data

Part 4 Build BUILD: Create a data model

Some of the steps we will take to build a model include:

Selecting the appropriate model
Building a model
Testing and training our model
Evaluating and refining our model

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.ipynb_checkpoints		.ipynb_checkpoints
NLP		NLP
Old Notes		Old Notes
foursquare		foursquare
project-capstone-workflow		project-capstone-workflow
train_test_data		train_test_data
.DS_Store		.DS_Store
Part 5 Tuning Parameters and Finding Estimator.ipynb		Part 5 Tuning Parameters and Finding Estimator.ipynb
Part4 OLD WORK Feature Transformation and Modelling.ipynb		Part4 OLD WORK Feature Transformation and Modelling.ipynb
README.md		README.md
foursquare_predictions.csv		foursquare_predictions.csv
part1_pitch_identify.ipynb		part1_pitch_identify.ipynb
part2_aquire_parse_ABSA_foursquare.ipynb		part2_aquire_parse_ABSA_foursquare.ipynb
part3_mine_refine.ipynb		part3_mine_refine.ipynb
part4.1_build_feature_transform_logreg_foursquare_predict.ipynb		part4.1_build_feature_transform_logreg_foursquare_predict.ipynb
part4.2_build_compare_multiple_estimators.ipynb		part4.2_build_compare_multiple_estimators.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GA_capstone_project

About

Releases

Packages

Languages

annabiancajones/GA_capstone_project

Folders and files

Latest commit

History

Repository files navigation

GA_capstone_project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages