H5weaver
requires libraries for HDF5 files. On Windows, these are bundled with the rhdf5
package. On Linux/Unix, you will need to first install hdf5 libraries.
This can usually be accomplished with:
sudo apt-get install hdf5-dev
or
sudo yum install hdf5-dev
Once hdf5 libraries are available, you can proceed to installation of rhdf5
.
The rhdf5
package is provided through BioConductor, and can be installed using:
if(!"BiocManager" %in% .packages(all.available = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("rhdf5")
H5weaver
also requires the data.table
, ids
, and Matrix
packages, which are available on CRAN and should be automatically installed by install_github()
.
This package can be installed from Github using the devtools
package.
You may first need to register your GitHub PAT, as this is a private repository.
Sys.setenv(GITHUB_PAT = "your-access-token-here")
devtools::install_github("aifimmunology/H5weaver")
The structure of AIFI .h5 files differs somewhat from files supplied directly by CellRanger. The major difference is the addition of richer metadata and additional results (e.g. from Cell Hashing).
A summary of the structure is available in Google Sheets here: 10x .h5 file structure
H5weaver allows for reading of the contents of an HDF5 file in multiple ways. To demonstrate, we'll use a file stored in the H5weaver package. You can obtain the path to this file using:
h5_file <- system.file("testdata/well1.h5", package = "H5weaver")
We can read and .h5 file directoy into a Seurat object using read_h5_seurat():
library(H5weaver)
so <- read_h5_seurat(h5_file)
This function places the RNA-seq counts in the "RNA" assay, and CITE-seq ADT counts (if present) in the "ADT" assay.
Likewise, we can read directly into a SingleCellExperiment object for use with BioConductor packages using read_h5_sce().
library(H5weaver)
sce <- read_h5_sce(h5_file)
Note that this requires a recent version of SingleCellExperiment (>= 1.8.0) so that ADTs are handled correctly.
In this case, RNA-$eq counts are stored in the "counts" slot for the SingleCellExperiment, and CITE-seq ADT counts (if present) are stored in the altExp "ADT" slot.
There is a convenience function to directly read the main cell x gene matrix from the HDF5 file, read_h5_dgCMatrix():
library(H5weaver)
mat <- read_h5_dgCMatrix(h5_file)
Note: By default, this matrix will be 1-indexed for convenient use in R. If you would rather retrieve a 0-indexed matrix, set the index1 parameter to FALSE:
mat <- read_h5_dgCMatrix(h5_file,
index1 = FALSE)
A convenience function is provided to retrieve all cell/observation-based metadata, read_h5_cell_meta():
cell_meta <- read_h5_cell_meta(h5_file)
Note that for the test dataset, this is only the cell barcodes, as additional metadata are not present.
A similar function is also provided for gene/feature-based metadata, read_h5_feature_meta():
feat_meta <- read_h5_feature_meta(h5_file)
To read the entirety of an HDF5 file as a list object, use h5dump():
library(H5weaver)
h5_list <- h5dump(h5_file)
str(h5_list)
This is a very raw representation of the contents of these HDF5 files. You may want to convert the major components to a sparse matrix (for cell x gene counts), and a data.frame (for metadata):
h5_list <- h5_list_convert_to_dgCMatrix(h5_list,
target = "matrix")
mat <- h5_list$matrix_dgCMatrix
feature_metadata <- as.data.frame(h5_list$matrix$features[-1])
Now, mat will consist of a dgCMatrix with genes as rows and barcodes as columns, and feature_metadata will be a data.frame with genes as rows and various metadata as columns.
For this test dataset, there isn't any cell metadata. However, files that are generated by our pipeline will include a substantial metadata set stored in matrix/observations. This can be retrieved with:
cell_metadata <- cbind(data.frame(barcodes = h5_list$matrix$barcodes),
as.data.frame(h5_list$matrix$observations))
Tests for HTOparser
are implemented using the testthat
testing suite:
https://testthat.r-lib.org/
To run tests for this package, download the repository from Github and run devtools::test()
in R from the base directory of the package.
Extra-stringent, CRAN-level package testing can be performed using devtools::check()
in R.
This package aims to conform to the tidyverse style guide:
https://style.tidyverse.org/index.html
General information about R package conventions can be found in R Packages
:
http://r-pkgs.had.co.nz/