H5weaver
requires libraries for HDF5 files. On Windows, these are bundled with the rhdf5
package. On Linux/Unix, you will need to first install hdf5 libraries.
This can usually be accomplished with:
sudo apt-get install hdf5-dev
or
sudo yum install hdf5-dev
Once hdf5 libraries are available, you can proceed to installation of rhdf5
.
The rhdf5
package is provided through BioConductor, and can be installed using:
if(!"BiocManager" %in% .packages(all.available = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("rhdf5")
H5weaver
also requires the data.table
, ids
, and Matrix
packages, which are available on CRAN and should be automatically installed by install_github()
.
This package can be installed from Github using the devtools
package.
devtools::install_github("aifimmunology/H5weaver")
The structure of AIFI .h5 files differs somewhat from files supplied directly by CellRanger. The major difference is the addition of richer metadata and additional results (e.g. from Cell Hashing).
A summary of the structure is available in Google Sheets here: 10x .h5 file structure
H5weaver allows for reading of the contents of an HDF5 file in multiple ways. To demonstrate, we'll use a file stored in the H5weaver package. You can obtain the path to this file using:
h5_file <- system.file("testdata/well1.h5", package = "H5weaver")
We can read and .h5 file directoy into a Seurat object using read_h5_seurat():
library(H5weaver)
so <- read_h5_seurat(h5_file)
This function places the RNA-seq counts in the "RNA" assay, and CITE-seq ADT counts (if present) in the "ADT" assay.
Likewise, we can read directly into a SingleCellExperiment object for use with BioConductor packages using read_h5_sce().
library(H5weaver)
sce <- read_h5_sce(h5_file)
Note that this requires a recent version of SingleCellExperiment (>= 1.8.0) so that ADTs are handled correctly.
In this case, RNA-$eq counts are stored in the "counts" slot for the SingleCellExperiment, and CITE-seq ADT counts (if present) are stored in the altExp "ADT" slot.
There is a convenience function to directly read the main cell x gene matrix from the HDF5 file, read_h5_dgCMatrix():
library(H5weaver)
mat <- read_h5_dgCMatrix(h5_file)
Note: By default, this matrix will be 1-indexed for convenient use in R. If you would rather retrieve a 0-indexed matrix, set the index1 parameter to FALSE:
mat <- read_h5_dgCMatrix(h5_file,
index1 = FALSE)
A convenience function is provided to retrieve all cell/observation-based metadata, read_h5_cell_meta():
cell_meta <- read_h5_cell_meta(h5_file)
Note that for the test dataset, this is only the cell barcodes, as additional metadata are not present.
A similar function is also provided for gene/feature-based metadata, read_h5_feature_meta():
feat_meta <- read_h5_feature_meta(h5_file)
To read the entirety of an HDF5 file as a list object, use h5dump():
library(H5weaver)
h5_list <- h5dump(h5_file)
str(h5_list)
This is a very raw representation of the contents of these HDF5 files. You may want to convert the major components to a sparse matrix (for cell x gene counts), and a data.frame (for metadata):
h5_list <- h5_list_convert_to_dgCMatrix(h5_list,
target = "matrix")
mat <- h5_list$matrix_dgCMatrix
feature_metadata <- as.data.frame(h5_list$matrix$features[-1])
Now, mat will consist of a dgCMatrix with genes as rows and barcodes as columns, and feature_metadata will be a data.frame with genes as rows and various metadata as columns.
For this test dataset, there isn't any cell metadata. However, files that are generated by our pipeline will include a substantial metadata set stored in matrix/observations. This can be retrieved with:
cell_metadata <- cbind(data.frame(barcodes = h5_list$matrix$barcodes),
as.data.frame(h5_list$matrix$observations))
Tests for HTOparser
are implemented using the testthat
testing suite:
https://testthat.r-lib.org/
To run tests for this package, download the repository from Github and run devtools::test()
in R from the base directory of the package.
Extra-stringent, CRAN-level package testing can be performed using devtools::check()
in R.
The license for this package is available on Github in the file LICENSE.txt in this repository.
We are not currently supporting this code, but simply releasing it to the community AS IS but are not able to provide any guarantees of support. The community is welcome to submit issues, but you should not expect an active response.
If you contribute code to this repository through pull requests or other mechanisms, you are subject to the Allen Institute Contribution Agreement, which is available in the file CONTRIBUTING.md in this repository.
This package aims to conform to the tidyverse style guide:
https://style.tidyverse.org/index.html
General information about R package conventions can be found in R Packages
:
http://r-pkgs.had.co.nz/