Genomic Selection Demo

The standard genetic model assumes that phenotype is the sum of a genetic component and a non-genetic component (residual), . Genomic Selection uses genetic markers covering the whole genome and potentially explaining all the genetic variance. These markers are asumed to be in Linkage Disequilibrium (LD) with the QTL thus models including all markers can estimate breeding values as combinatons of these QTL's.

Model

Response variable y for the i-th individual (i=1,...,n) is regressed on a function of p marker genotypes that seeks to aproximate to the true genetic value of the individual, this is

where function can be a parametric or non-parametric and are the residuals which are usually assumed to be distributed Normal with constant variance .

Parametric regression

The genotypic value of an individual is estimated using a linear model in which a linear combination of the marker genotypes are used, that is

where is the intercept, is the genotype of the i-th individual at the j-th marker, is the corresponding marker effect.

Model above presents some estimation difficulties when p is much bigger than n so penalization ans regularization aproaches are used to overcome this problem. Penalization and regularization solutions can be seen as posterior solutions in the Bayesian context.

1. Bayesian Ridge Regression (BRR).

Is a penalization regression that assumes that the regression coefficients follow independently a Gaussian (Normal) prior distribution, this is . This prior induces shrinkage of estimates toward zero.

2. Bayesian LASSO.

It assumes that the regression coefficients have a prior distribution double-exponential (DE, or Laplace) with parameters and . This prior is a thick-tailed prior that can be represented as a infinite mixture of normal densities scaled by exponential () densities, this is

3. Bayes A

The regression effects are assumed another thick-tailed prior, a scaled t distribution with degree of freedom and scale parameters. Similar as for doble-exponential, the scaled t distribution is represented as mixture of normal densities scaled with a scaled-inverse Chi-squared () density, this is

4. Bayes B

Markers effects are asummed to be equal to zero with probability and with probability 1- are assumed to follow a scaled t distribution as in Bayes A model.

5. Bayes C

Similar to Bayes B, markers effects are asummed to be equal to zero with probability and with probability 1- are assumed to follow a Gaussian distribution as in BRR model.

6. G-BLUP model (RR-BLUP)

The response is modeled as and its solution is equivalent to that of the BRR model arised when in the model above we make the sustitution

It can be shown that the random vector follows a Normal distribution , where with X is the matrix of centered and standardized marker genotypes and it is called genomic relationship matrix.

Semi-parametric regression

7. RKHS regression.

The genomic function is expressed as a linear combination of some positive semi-definite basis functions called Reproducing Kernels (RK), , as follows

This model can be rewritten as where is a matrix containing all the evaluations of the RK function at the point (i,i') and .

This problem can be solved in a Bayesian fashion by assuming a prior .

Note: The Ridge Regression (and consequently, G-BLUP) can be represented as a RKHS model by setting K=G.

Implementation of models

Models previously above described will be implemented in R software using R-packages 'BGLR' and 'rrBLUP'. Using public data, it will be shown how to run the models for the single-environment case and then how to perform a multi-environment analysis with the G-BLUP model using a marker-by-environment (MxE) and a Reaction Norm approaches that account for GxE interaction.

Data

Data from CIMMYT’s Global Wheat Program. Lines were evaluated for grain yield (each entry corresponds to an average of two plot records) at four different environments; phenotypes (wheat.Y object) were centered and standardized to a unit variance within environment. Each of the lines were genotyped for 1279 diversity array technology (DArT) markers. At each marker two homozygous genotypes were possible and these were coded as 0/1. Marker genotypes are given in the object wheat.X. Finally a matrix wheat.A provides the pedigree relationships between lines computed from the pedigree records. Data is available for download in the R-package 'BGLR'.

R-packages installation

if(!"BGLR"%in%rownames(installed.packages()))  install.packages("BGLR")
if(!"rrBLUP"%in%rownames(installed.packages())) install.packages("rrBLUP")
library(BGLR)
library(rrBLUP)

Download data

data(wheat)
X <- wheat.X
Y <- wheat.Y
A <- wheat.A

# Visualize data
head(Y)
X[1:10,1:5]

Type of analyses

Single-environment
Multi-environment

References

de los Campos, G., Gianola, D., Rosa, G. J. M., Weigel, K. A., & Crossa, J. (2010). Semi-parametric genomic-enabled prediction of genetic values using reproducing kernel Hilbert spaces methods. Genetics Research, 92(4), 295–308.
de los Campos, G., Hickey, J. M., Pong-Wong, R., Daetwyler, H. D., & Calus, M. P. L. (2013). Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics, 193(2), 327–345.
Endelman, J. B. (2011). Ridge Regression and Other Kernels for Genomic Selection with R Package rrBLUP. The Plant Genome Journal, 4(3), 250–255.
Habier, D., Fernando, R. L., Kizilkaya, K., & Garrick, D. J. (2011). Extension of the bayesian alphabet for genomic selection. BMC Bioinformatics, 12(186), 1-12.
Jarquín, D., Crossa, J., Lacaze, X., Du Cheyron, P., Daucourt, J., Lorgeou, J., … de los Campos, G. (2014). A reaction norm model for genomic selection using high-dimensional genomic and environmental data. Theoretical and Applied Genetics, 127(3), 595–607.
Lopez-Cruz, M., Crossa, J., Bonnett, D., Dreisigacker, S., Poland, J., Jannink, J.-L., … de los Campos, G. (2015). Increased prediction accuracy in wheat breeding trials using a marker × environment interaction genomic selection model. G3: Genes, Genomes, Genetics, 5(4), 569–582.
Meuwissen, T. H. E., Hayes, B. J., & Goddard, M. E. (2001). Prediction of total genetic value using genome-wide dense marker maps. Genetics, 157(4), 1819–1829.
Park, T., & Casella, G. (2008). The Bayesian Lasso. Journal of the American Statistical Association, 103(482), 681–686.
Perez, P., & de los Campos, G. (2014). Genome-wide regression and prediction with the BGLR statistical package. Genetics, 198(2), 483–495.
R Development Core Team. (2015). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing.

Name		Name	Last commit message	Last commit date
Latest commit History 257 Commits
Accuracy_distn_CV1_multiEnv.png		Accuracy_distn_CV1_multiEnv.png
Accuracy_distn_CV2_multiEnv.png		Accuracy_distn_CV2_multiEnv.png
CV1_2_scheme.png		CV1_2_scheme.png
README.md		README.md
boxplot1.png		boxplot1.png
fitModels_multi.R		fitModels_multi.R
get_CV1_partitions.R		get_CV1_partitions.R
get_CV2_partitions.R		get_CV2_partitions.R
get_VarComps_multi.R		get_VarComps_multi.R
multi_environment.md		multi_environment.md
prepareData_multi.R		prepareData_multi.R
run_jobs_multi.sh		run_jobs_multi.sh
single_environment.md		single_environment.md
varComp.png		varComp.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Genomic Selection Demo

Model

Parametric regression

1. Bayesian Ridge Regression (BRR).

2. Bayesian LASSO.

3. Bayes A

4. Bayes B

5. Bayes C

6. G-BLUP model (RR-BLUP)

Semi-parametric regression

7. RKHS regression.

Implementation of models

Data

R-packages installation

Download data

Type of analyses

References

About

Releases

Packages

Languages

DPCscience/Genomic-Selection

Folders and files

Latest commit

History

Repository files navigation

Genomic Selection Demo

Model

Parametric regression

1. Bayesian Ridge Regression (BRR).

2. Bayesian LASSO.

3. Bayes A

4. Bayes B

5. Bayes C

6. G-BLUP model (RR-BLUP)

Semi-parametric regression

7. RKHS regression.

Implementation of models

Data

R-packages installation

Download data

Type of analyses

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages