Skip to content

JockFax32/eda

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Exploratory Data Analysis

Exploratory data analysis (EDA) is the practice of investigating data to understand it's fundamental structure. This process often leverages visualization to quickly understand univariate distributions and relationships between variables. Initial EDA questions ask basic questions, including:

  • How large is the dataset (rows, columns)?
  • What are the variables present in the dataset?
  • What is the data type of each variable?

Following that basic information, it's common to dive deeper into particular variables, evaluating:

  • What is the distribution of the variable?
  • Is the variable ever missing (and if so, why)?
  • What are the basic summary statistics (mean, median, standard deviation) of my variable, and what is it's range (min/max)?

Throughout this initial process, one often develops more specific questions about each variable, or the dataset more generally. For example,

  • Is the distribution of my variable consistent across groupings?

Finally, relationships between variables may be assessed:

  • Is there a correlation between these two (or any two) variables?

Exploratory data analysis is a crucial step to understanding your data prior to any statistical analysis.

About

Exploratory Data Analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 52.4%
  • Jupyter Notebook 47.6%