Powering Life Sciences with Big Data
Imagine a library that collects not just books, but all biological data produced by scientists worldwide—from DNA sequences and protein structures to clinical research data. Now imagine this library doesn't just store information but makes it freely available, connects related discoveries, and provides tools to analyze it. This is the European Bioinformatics Institute (EMBL-EBI), a powerhouse driving 21st-century biological research.
EMBL-EBI's work in managing data while developing innovative tools and training programs represented a critical infrastructure for science—as essential to modern biology as laboratories and microscopes.
By 2018, the explosion of data from new sequencing technologies and experimental techniques had reached unprecedented levels. EMBL-EBI reported that their total raw storage capacity exceeded 160 petabytes—a staggering volume that continues to grow exponentially 1 3 .
To put this in perspective, if one byte were one grain of rice, 160 petabytes would fill over 60 Olympic-sized swimming pools.
This massive growth was particularly driven by nucleotide sequences archived in the European Nucleotide Archive (ENA) and the European Genome-phenome Archive 3 . Notably, proteomics data submitted to the PRIDE database had seen significant growth since 2016, becoming the second-largest storage footprint after nucleotide sequences 3 .
EMBL-EBI Storage Growth (2016-2018)
To manage these increasing data flows while maintaining service quality, EMBL-EBI made significant infrastructure improvements in 2018:
The institute doubled the bandwidth of its connection to the worldwide web, ensuring researchers could access data quickly regardless of location 1 3 . They also improved the efficiency of their computational infrastructure, crucial for supporting the over 150 analytical bioinformatics tools they maintained 1 .
Year | Total Storage Capacity | Year-over-Year Growth |
---|---|---|
2016 | 120 petabytes | - |
2017 | 140 petabytes | 16.7% |
2018 | 160 petabytes | 14.3% |
One of the most significant launches in 2018 was the Single Cell Expression Atlas (https://www.ebi.ac.uk/gxa/sc), a new component of the Expression Atlas that addressed the rapid growth of single-cell genomics research 1 3 .
This revolutionary resource allowed scientists to explore gene expression in individual cells, providing unprecedented resolution for understanding cellular diversity and function.
Another major 2018 innovation was the PDBe-Knowledgebase (https://www.ebi.ac.uk/pdbe/pdbe-kb), a community-driven resource that collated functional annotations and predictions for structural data in the Protein Data Bank 1 3 .
This resource represented a collaboration between the PDBe team and multiple bioinformatics resources, consolidating curated and enriched data to provide biological context for protein structures.
Resource Category | Example Resources | Primary Function |
---|---|---|
Genomic Data | European Nucleotide Archive (ENA), European Genome-phenome Archive | Store and provide access to raw sequence data |
Protein Data | PDBe-Knowledgebase, UniProt | Curate and analyze protein sequences and structures |
Gene Expression | Single Cell Expression Atlas, Expression Atlas | Display gene expression patterns across conditions and tissues |
Literature | Europe PMC | Provide access to scientific publications and preprints |
Tools & Analysis | Over 150 bioinformatics tools | Enable data analysis through web interfaces and APIs |
Each year, EMBL-EBI ran specialized training courses to equip researchers with essential bioinformatics skills. The 2018 Summer School in Bioinformatics provides an excellent example of how the institute translated complex data resources into practical research skills 4 .
The summer school employed a learn-by-doing approach where participants worked in small groups on realistic research challenges set by EMBL-EBI experts 4 . The course spanned five days, combining theoretical instruction with hands-on project work.
Participants worked with real-life proteomics data from clinical tumor samples
They analyzed proteomics data using EMBL-EBI tools and resources
They interpreted their results using the Open Targets Platform
The project culminated in group presentations where participants shared findings
This training approach produced significant outcomes for participants. Post-course assessments showed that researchers gained practical skills in:
The summer school represented just one facet of EMBL-EBI's comprehensive training program, which included on-site, off-site, and web-based training opportunities for thousands of researchers worldwide in 2018 1 .
Resource/Tool | Type | Primary Function |
---|---|---|
Single Cell Expression Atlas | Data Resource | Explore gene expression in individual cells across different conditions |
PDBe-Knowledgebase | Data Resource | Access functional annotations and predictions for protein structures |
Europe PMC | Literature Resource | Search scientific publications and preprint abstracts |
EBI Search | Discovery Tool | Search across multiple EMBL-EBI data resources simultaneously |
API Services | Programmatic Access | Enable high-throughput data access and analysis through code |
As we reflect on EMBL-EBI's work in 2018, the institute's impact extends far beyond simply storing biological data. Through its sophisticated infrastructure, innovative tools, and comprehensive training programs, EMBL-EBI created an ecosystem where data could be transformed into biological insights.
The 2018 updates—including the Single Cell Expression Atlas, PDBe-Knowledgebase, and enhanced computational infrastructure—demonstrated EMBL-EBI's commitment to staying at the forefront of scientific progress. By making all data and tools freely available worldwide and ensuring interoperability between resources, the institute embodied the principles of open science that accelerate discovery across all areas of biology and medicine.
As biological data continues to grow exponentially, the work of institutions like EMBL-EBI becomes increasingly vital. Their approach to data management, tool development, and researcher training provides a blueprint for how to harness the power of big data to advance human health, understand biological systems, and train the next generation of scientists.
In the landscape of modern biology, EMBL-EBI serves not just as a repository of information, but as an active engine of discovery, enabling research that benefits people throughout the world.