The European Bioinformatics Institute in 2018

Powering Life Sciences with Big Data

Bioinformatics Big Data Research Infrastructure

Introduction: The Global Hub for Biological Data

Imagine a library that collects not just books, but all biological data produced by scientists worldwide—from DNA sequences and protein structures to clinical research data. Now imagine this library doesn't just store information but makes it freely available, connects related discoveries, and provides tools to analyze it. This is the European Bioinformatics Institute (EMBL-EBI), a powerhouse driving 21st-century biological research.

Central Hub for Biological Data

In 2018, EMBL-EBI served as Europe's central hub for biological data, archiving, curating, and analyzing life sciences data produced by researchers throughout the world 1 3 .

Essential Scientific Infrastructure

EMBL-EBI's work in managing data while developing innovative tools and training programs represented a critical infrastructure for science—as essential to modern biology as laboratories and microscopes.

The Infrastructure: Handling Biology's Big Data

Data Growth and Storage Challenges

By 2018, the explosion of data from new sequencing technologies and experimental techniques had reached unprecedented levels. EMBL-EBI reported that their total raw storage capacity exceeded 160 petabytes—a staggering volume that continues to grow exponentially 1 3 .

To put this in perspective, if one byte were one grain of rice, 160 petabytes would fill over 60 Olympic-sized swimming pools.

This massive growth was particularly driven by nucleotide sequences archived in the European Nucleotide Archive (ENA) and the European Genome-phenome Archive 3 . Notably, proteomics data submitted to the PRIDE database had seen significant growth since 2016, becoming the second-largest storage footprint after nucleotide sequences 3 .

Storage Growth Visualization

EMBL-EBI Storage Growth (2016-2018)

Engineering for Excellence

To manage these increasing data flows while maintaining service quality, EMBL-EBI made significant infrastructure improvements in 2018:

  • Doubled bandwidth
    New
  • Improved computational efficiency
  • Enhanced interconnectivity

The institute doubled the bandwidth of its connection to the worldwide web, ensuring researchers could access data quickly regardless of location 1 3 . They also improved the efficiency of their computational infrastructure, crucial for supporting the over 150 analytical bioinformatics tools they maintained 1 .

Storage Growth Table
Year Total Storage Capacity Year-over-Year Growth
2016 120 petabytes -
2017 140 petabytes 16.7%
2018 160 petabytes 14.3%

New Tools and Resources: 2018's Innovations

Single Cell Expression Atlas

One of the most significant launches in 2018 was the Single Cell Expression Atlas (https://www.ebi.ac.uk/gxa/sc), a new component of the Expression Atlas that addressed the rapid growth of single-cell genomics research 1 3 .

This revolutionary resource allowed scientists to explore gene expression in individual cells, providing unprecedented resolution for understanding cellular diversity and function.

Single-cell RNA sequencing Gene expression
PDBe-Knowledgebase

Another major 2018 innovation was the PDBe-Knowledgebase (https://www.ebi.ac.uk/pdbe/pdbe-kb), a community-driven resource that collated functional annotations and predictions for structural data in the Protein Data Bank 1 3 .

This resource represented a collaboration between the PDBe team and multiple bioinformatics resources, consolidating curated and enriched data to provide biological context for protein structures.

Protein structures Functional annotations Community-driven

Key Data Resources at EMBL-EBI in 2018

Resource Category Example Resources Primary Function
Genomic Data European Nucleotide Archive (ENA), European Genome-phenome Archive Store and provide access to raw sequence data
Protein Data PDBe-Knowledgebase, UniProt Curate and analyze protein sequences and structures
Gene Expression Single Cell Expression Atlas, Expression Atlas Display gene expression patterns across conditions and tissues
Literature Europe PMC Provide access to scientific publications and preprints
Tools & Analysis Over 150 bioinformatics tools Enable data analysis through web interfaces and APIs

In-depth Look: Training the Next Generation of Scientists

The 2018 Summer School - A Case Study in Bioinformatics Education

Each year, EMBL-EBI ran specialized training courses to equip researchers with essential bioinformatics skills. The 2018 Summer School in Bioinformatics provides an excellent example of how the institute translated complex data resources into practical research skills 4 .

Methodology: Learning Through Collaborative Projects

The summer school employed a learn-by-doing approach where participants worked in small groups on realistic research challenges set by EMBL-EBI experts 4 . The course spanned five days, combining theoretical instruction with hands-on project work.

Real-life Data Analysis

Participants worked with real-life proteomics data from clinical tumor samples

Tool Application

They analyzed proteomics data using EMBL-EBI tools and resources

Biological Interpretation

They interpreted their results using the Open Targets Platform

Knowledge Sharing

The project culminated in group presentations where participants shared findings

Results and Impact: Building Bioinformatics Capacity

This training approach produced significant outcomes for participants. Post-course assessments showed that researchers gained practical skills in:

  • Browsing, searching, and retrieving biological data
  • Using appropriate bioinformatics tools
  • Understanding how biological data is stored and organized
  • Interconnecting related biological data

The summer school represented just one facet of EMBL-EBI's comprehensive training program, which included on-site, off-site, and web-based training opportunities for thousands of researchers worldwide in 2018 1 .

Bioinformatics Research Reagent Solutions at EMBL-EBI (2018)
Resource/Tool Type Primary Function
Single Cell Expression Atlas Data Resource Explore gene expression in individual cells across different conditions
PDBe-Knowledgebase Data Resource Access functional annotations and predictions for protein structures
Europe PMC Literature Resource Search scientific publications and preprint abstracts
EBI Search Discovery Tool Search across multiple EMBL-EBI data resources simultaneously
API Services Programmatic Access Enable high-throughput data access and analysis through code

Conclusion: Enabling the Future of Biology

As we reflect on EMBL-EBI's work in 2018, the institute's impact extends far beyond simply storing biological data. Through its sophisticated infrastructure, innovative tools, and comprehensive training programs, EMBL-EBI created an ecosystem where data could be transformed into biological insights.

The 2018 updates—including the Single Cell Expression Atlas, PDBe-Knowledgebase, and enhanced computational infrastructure—demonstrated EMBL-EBI's commitment to staying at the forefront of scientific progress. By making all data and tools freely available worldwide and ensuring interoperability between resources, the institute embodied the principles of open science that accelerate discovery across all areas of biology and medicine.

As biological data continues to grow exponentially, the work of institutions like EMBL-EBI becomes increasingly vital. Their approach to data management, tool development, and researcher training provides a blueprint for how to harness the power of big data to advance human health, understand biological systems, and train the next generation of scientists.

In the landscape of modern biology, EMBL-EBI serves not just as a repository of information, but as an active engine of discovery, enabling research that benefits people throughout the world.

References