Skip to content

Commit

Permalink
Update 2015-11-20_computational_pipeline.md
Browse files Browse the repository at this point in the history
  • Loading branch information
dlebauer committed Nov 21, 2015
1 parent 95f642d commit 09b094d
Showing 1 changed file with 33 additions and 1 deletion.
34 changes: 33 additions & 1 deletion meeting-notes/2015-11-20_computational_pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,8 +84,40 @@ Can support this project; they need clear specifications and funding

### Define Collaborations among teams (with germplasm, informatics, cyberinfrastrucutre)

* **TODO** Survey of theteams to be coordinated by Rob, Rachel, and David

What can teams collaborate on?
* what people are required to do according to milestones, and what are the overlaps?
* how can the reference team help? to what degree are teams interested in using a centralized pipeline, and in sharing data at different points in the pipeline?

* **TODO** Survey of teams to be coordinated by Rob, Rachel, and David

#### Summary

The Cat5 reference platform was not designed to develop bioinformatics tools. Indeed, the focus of TERRA is to develop the technology of phenomics that has fallen behind that of genomics.

HPCBio provides a wide range of bioinformatics computing services [1] and the cluster they use is biocluster [2]. The costs of using Biocluster are reasonable and both the HPCBio and IGB teams are exceptional. In addition, IGB have Galaxy and KBase available. Fees are reasonable and the quality of their work is high. The HPCBio team is enthusiastic about our project and available to assist.
Access to the expertise and services of HPCBio provides a major added value to users of the reference pipeline, but has not been budgeted as an essential feature.

Because the forcus of the TERRA program is to advance phenomics, the UI / NCSA team has been built around relevant expertise in ecophysiology, high performance computing, GIS, and computational workflows.
The TERRA program has many experts in bioinformatics. We can thus provide a common collaborative space for the implementation of cutting edge pipelines while also supporting the use of existing and familiar tools and substantial computing power.


NCSA is very generous and supportive of our efforts, and have committed to removing limitations imposed by computing power or storage space. The limiting factor will be the number of contributors to a shared infrastructure and the efficiency with which they can collaborate.

#### For individual teams, protected IP

We have a core set of computing allocated for use by independent researchers and teams without any requirement that the teams share code or data. Indeed, secure computing and storage is an important feature of our platform. While this resource is more limited, NCSA has staff to help researchers apply for as well as use and optimize code for HPC allocations (through xsede.org). XSEDE allocations must be renewed annually, but usually the challenge is getting people to use them.

#### For collaboration and open science objectives

We can provide an unprecedented level of support for open science aimed at sharing data and computing infrastructure.

For more details about the computing that has been allocated, the CyberGIS group has committed 1 PB of online storage and a million compute hours on a dedicated node, plus access to many times that much computing on a shared queue [3].
Additional requests to support the 10x increase in data volume that has occurred during construction of the Lemnatec field system [4].


[1] http://hpcbio.illinois.edu/content/services-and-fees
[2] http://help.igb.illinois.edu/Biocluster
[3] https://wiki.ncsa.illinois.edu/display/ROGER/ROGER+Technical+Summary
[4] http://terraref.ncsa.illinois.edu/articles/spectral-imaging-data-volume-compression/#on-the-upcoming-data-deluge

0 comments on commit 09b094d

Please sign in to comment.