Genomic Data Analysis Project
Underpinning genomic research
The Genomic Data Analysis Project addresses the data management needs for the next-generation sequencing community. Genomic projects underpin almost all aspects of modern biology. This includes modern molecular biology, biodiversity studies, and medical research including but not limited to research into cancer, vaccines, antibiotics and drug development.
Many research institutions around NSW have purchased new generation DNA sequencing instruments and need to store, curate, access and analyse the immense amount of data to be generated.
The new DNA sequencing equipment generates billions of base pairs worth of sequence data per day, and this will only rise. This new equipment shifts the bottleneck away from the generation of DNA data and onto the ongoing data processing, data management and data access to ensure that the information is readily available to support research.
This project centralises the effort of several major institutions in the scoping and development work necessary to make effective use of gene sequencing instruments, as well as ensuring centralised computational and data storage facilities can be used effectively in this research.
The project benefits a wide user base including researchers at UNSW, Southern Cross University, the Australian National University and many others. One of the key advantages of the project is that it is designed for easy deployment at other, new sites.
The project was completed in May 2010.
GDA helps experimental scientists manage the data and design of their experiments. This includes wet-lab experiments like tissue sample preparation, next generation sequencing and base calling, and tertiary analyses such as Blast searching. Experimental scientists in genomics face two specific challenges. The volumes of data are enormous, and their experimental designs are subject to continual change. GDA directly addresses both of these issues.
The GDA user interface is designed around experiments and projects. Experimenters encode their experimental design as a set of input and output parameters (i.e. independent and dependent variables). The GDA user interface uses these definitions to allow easy and validated data entry of experiment configurations, as well as reuse and sharing of experimental designs.
Projects make it easy to allow controlled access to your experiments, including integrated support for inclusion of your data in the ANDS Australian Research Data Commons.
Experiment output data from all experiments are transparently compressed to minimise download times and server storage requirements.
Each result in the repository comprises the data for a single experiment (both its design and output data). Output data from one experiment is often used as input data to subsequent experiments, forming a sequence of interdependent experiments. GDA supports reuse of experiments, including full scientific auditing of results right back to their source.
Intersect’s GDA system provides a repository for experimental results and metadata. The experimental metadata for any facility and type of experiment can be easily configured and the GDA system automatically generates the user interface to allow easy metadata entry.
Access to the GDA system is via a web based user interface. Upload of experimental results is handled via a Java applet running in the web browser and works over institutional fire-walls.
The system offers the following key features:
- a repository for results and metadata based on Fedora Commons
- full user and group management,
- access management for results,
- grouping of results and users into projects,
- the ability to confer result ownership
- the ability to create derived results and to relate them to the original experiment
- export of results to the ANDS Australian Research Data Commons.
Projects undertaken by Intersect are available for all members to use. This project directly supports Illumina and Roche/ 454 next generation gene sequencers. If you are interested in this project, contact us at firstname.lastname@example.org or via your local eResearch Analyst.
Project sheet available here
Start Date: September 2009
End Date: May 2010
Clients: The Ramaciotti Centre for Gene Function Analysis, UNSW, the Centre of Plant Conservation Genetics, Southern Cross University
Members: UNSW, Southern Cross University
Technologies used: Fedora Commons, GDA Repository, ANDS Research Data Commons, jQuery, PostgreSQL, Tomcat, Spring.
For any enquiries, please contact Rodney Harrison:
T 61 2 8079 2551
“We can now store massive amounts of genomic data, share it with our colleagues and analyse it in a seamless manner.
This is fundamental infrastructure that underpins genomic research. Without it, we just can’t do the work.”
Professor Marc Wilkins
Ramaciotti Centre for Gene Function Analysis