layout | title | permalink |
---|---|---|
home |
R for non-technical PhD researchers |
/ |
- Installing R and RStudio
- Navigating the RStudio interface
- Basic R syntax: Variables, data types, operators, and functions
- Writing and running R scripts
- Key libraries for data science in R
- Vectors, Matrices, Lists, and Data Frames
- Indexing, subsetting, and manipulating data
- Importing and exporting data (CSV, Excel, SPSS, etc.)
- Basic exploratory analysis (summary statistics, structure, and head/tail functions)
- Filtering, arranging, mutating, and summarizing data
- Grouped operations and pipelines using
%>%
- Reshaping data with pivot_longer() and pivot_wider()
- Joining datasets: inner, outer, left, and right joins
- Hands-on data cleaning exercise
- Basic plots: Histograms, scatter plots, bar charts, and box plots
- Advanced visualizations: Heatmaps, faceted plots, and density plots
- Customizing plots with themes, annotations, and labels
- Interactive visualizations with
plotly
- Case study: Visualizing relationships in a real dataset
- Measures of central tendency and variability
- Identifying outliers and missing data
- Visualizing distributions and relationships (e.g., correlation plots)
- Preparing datasets for statistical analysis
- Introduction to hypothesis testing
- One-sample and two-sample t-tests
- Paired t-tests and their applications
- Chi-square tests for independence
- Non-parametric tests: Wilcoxon and Mann-Whitney U tests
- Analysis of Variance (ANOVA): One-way and two-way
- Post hoc tests (e.g., Tukey’s HSD)
- Simple and multiple linear regression analysis
- Logistic regression for binary outcomes
- Case study: Predicting outcomes using regression models
- Principal Component Analysis (PCA) for dimensionality reduction
- Factor analysis and interpretation of factors
- Cluster analysis: k-means and hierarchical clustering
- Case study: Clustering research observations
- Overview of supervised and unsupervised learning
- Data preprocessing: Scaling, normalization, and feature engineering
- Splitting datasets into training, testing, and validation sets
- Implementing cross-validation and hyperparameter tuning
- Decision trees and random forests
- k-Nearest Neighbors (k-NN)
- Support Vector Machines (SVM)
- Performance metrics: Confusion matrix, accuracy, precision, recall, F1 score
- Case study: Classifying research observations
- Advanced regression techniques: Ridge, Lasso, and Elastic Net
- Regression trees and boosting methods (e.g., XGBoost, LightGBM)
- Hands-on project: Regression modeling for real-world research data
- k-means and hierarchical clustering revisited
- Density-based clustering (DBSCAN) and Gaussian Mixture Models
- Evaluating clustering performance
- Case study: Discovering patterns in research datasets
- Ensemble learning: Bagging and Boosting
- Neural networks basics with R (e.g.,
keras
andtensorflow
packages) - Time series forecasting using ARIMA and Prophet
- Hands-on project: Applying advanced ML techniques
- Creating dynamic reports with R Markdown
- Exporting to PDF, Word, and HTML
- Introduction to Shiny apps for interactive research tools
- Best practices for reproducible research workflows
- Students present their final projects, integrating statistical or machine learning methods
- Feedback and discussion on projects
- Exploring advanced R tools for domain-specific applications
- Course summary and future learning resources