Skip to content

gblasd/StatisticalTechniquesAndDataMining

Repository files navigation

Diplomature Statistic Techniques and Data Mining

In this repository store the notebooks that were developed in the diplomature. Themes per module:

  • Module 1 - Data Bases Design: I learned to design and create databases usign MySQL, DDL and DML.
  • Module 2 - Statistics Models: I learned about the probability, distributions, random numbers, random variables and their uses in the life, using Python as language programming.
  • Module 3 - Regression and Time Series: How my first course where I knowed about the time series as ARIMA. Also I learned Regression Lineal and Logistic through examples using Python as language programming.
  • Module 4 - Data Mining: This module is my favourite, first I kwnowed about Datawarehouses and how it´s used for data analysis of the data. Also I used models to clasification and regression as Decision Tree, PCA, Neuronal Networks, Regression Lineal, Kmeans, etc. In this module I used XGBoost to create a model of Binary Classification of the data.
  • Modele 5 - Stochastic Simulation: I used a software of simulation ARENA to create a simulation of proccess where it´s used the generation of random numbers with differents distributions applied in generation of times.
  • Module 6 - Analysis of Variance, Factorial and Correspondence: Some models as PCA, Agglomerative cluster, hierarchical cluster, Discriminant analysis, Kmeans and Factor Analysis using R as language programming.

I liked this diplomate because I learned many theory and models to analyze data. How use models in the different problems and also tools to could be used to fit the models. The best tool is Python, because exists a lot of information as Documentation, forums and video tuturials about the language, It´s very easy to learn. I could understand how to interpret the models and manipulate the data from database to input model selected. View the data through graphs generated with tools as PowerBI, Python and R, in this way is possible know what happen with the data.

Anything very important is the process ETL, whitout the ETL process is very difficult to achieve fit the models and get good results.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages