From Code to Cures

The Indian-American Scientists Revolutionizing Bioinformatics and Genomics

Across the United States, a vibrant community of Indian-origin scientists is helping transform how we understand life itself. They are the architects and engineers of bioinformatics and genomics—fields that mine the vast digital universe of genetic data for insights that can improve human health, combat disease, and feed the planet.

The Invisible Architects of Modern Biology

In a California lab, Dr. Venkatesan Sundaresan studies the intricate genetic blueprint of rice, searching for keys to global food security. Meanwhile, in New York, Dr. Shruti Naik deciphers the complex conversation between immune cells and skin tissue, seeking breakthroughs for inflammatory diseases. What connects these scientists—besides their groundbreaking work—is their shared heritage and their position at the forefront of a scientific revolution: the fusion of biology with computational science.

Bioinformatics

Developing methods and software for understanding biological data

Genomics

Studying the structure, function, evolution, and mapping of genomes

These researchers stand at the intersection of tradition and innovation, building upon a storied legacy of Indian scientific excellence while pioneering new frontiers in data-driven biology ¹ .

The Pioneers: Building Bridges Between Disciplines

Indian-born scientists have a long history of groundbreaking contributions to Western science, with luminaries like Har Gobind Khorana (who shared the 1968 Nobel Prize for deciphering the genetic code) setting the stage for today's researchers. The current generation builds upon this foundation, navigating complex biological questions with sophisticated computational tools.

Scientist	Institutional Affiliation	Key Contributions	Honors
Venkatesan Sundaresan	UC Davis	Plant reproduction and synthetic apomixis for crop improvement	Wolf Prize in Agriculture (2024), US National Academy of Sciences ¹ ³
Shruti Naik	Icahn School of Medicine at Mount Sinai	Immunology, stem cell biology, and tissue inflammation	L'Oréal-UNESCO Award, NIH Director's New Innovator Award ¹ ³
Inder Verma	Salk Institute (emeritus)	Cancer biology and viral vectors for gene therapy	Former editor-in-chief of PNAS ¹
Utpal Banerjee	UCLA	Genetics, developmental biology, and stem cell research	NIH Director's Pioneer Award, US National Academy of Sciences ¹
Anshul Kundaje	Stanford University	Computational genomics and machine learning	Organizer of Genome Informatics conference ⁹

Interdisciplinary Training

International Collaboration

Mentorship & Leadership

The Data Deluge: When Biology Became an Information Science

The turning point came with the completion of the Human Genome Project in 2003, which provided the first complete sequence of human DNA. This milestone marked biology's transformation into a data-intensive science—a field where the primary challenge shifted from gathering information to making sense of overwhelming volumes of it.

"The shift from ASM NGS to ASM BIG mirrors the rapid transformation of microbial sciences, where the challenge is no longer just sequencing data—it is making sense of it, managing it and applying it to solve real-world problems," noted organizers of the newly launched ASM Bioinformatics, Genomics and Big Data Conference ² .

Human Genome Project (2003)

Time to sequence one human genome: ~13 years
Cost per genome: ~$3 billion
Data generated per genome: ~3 GB
Largest dataset size: One reference genome

Current Large Studies (2025)

Time to sequence one human genome: ~1 day
Cost per genome: ~$500
Data generated per genome: ~200 GB (with multiple sequencing)
Largest dataset size: GenomeIndia: 9,772 genomes simultaneously analyzed ⁷

Genomic Data Growth Over Time

Case Study: The GenomeIndia Project—Mapping Genetic Diversity

While the project is based in India, its implications ripple across global science, with many Indian-origen researchers in the US contributing to similar large-scale genomic initiatives. The project exemplifies the type of research that this community is advancing worldwide.

Methodology: A Step-by-Step Approach

Sample Collection

Researchers gathered blood samples from 10,074 healthy and unrelated Indians representing 85 diverse populations across the country, including both tribal and non-tribal groups ⁷ .

DNA Extraction and Sequencing

Using advanced sequencing machines, the team read the complete genetic code of each participant, generating strings of A's, T's, C's, and G's that make up their individual genomes.

Variant Calling

Bioinformatics specialists compared each sequenced genome to a reference human genome, identifying points of difference called "variants."

Population Analysis

Computational biologists grouped the genetic variants based on which populations they appeared in, identifying which were common across groups and which were unique to specific communities.

Results and Analysis: A Treasure Trove of Genetic Variation

The preliminary findings, published in Nature Genetics in April 2025, revealed 180 million genetic variants—positions in the DNA sequence where the studied individuals differed from one another or from the reference human genome ⁷ .

180M

Genetic Variants Identified

85

Populations Represented

9,772

Individuals Analyzed

Category	Finding	Potential Application
Total variants identified	180 million	Baseline for Indian population genomics
Populations represented	85 groups (32 tribal, 53 non-tribal)	Understanding population-specific disease risks
Sample size	9,772 individuals (after quality control)	Statistically powerful dataset for rare variants
Data repository	Indian Biological Data Centre (IBDC)	Resource for global research community ⁷

Dr. Kumarasamy Thangaraj, one of the project leaders, explained: "We are looking for variants which are functionally relevant—related to diseases, those associated with therapeutic responses or no responses, and those that are causing adverse effects to therapeutic agents" ⁷ .

The Scientist's Toolkit: Essential Research Reagents and Solutions

Behind every genomic discovery lies an array of specialized tools—both wet-lab reagents and dry-lab computational solutions. Here are the essential components powering this research revolution:

Tool Category	Specific Examples	Function
Sequencing Technologies	Illumina NovaSeq, Oxford Nanopore	Determine the order of nucleotides in DNA/RNA molecules
Bioinformatics Pipelines	BWA, GATK, Cell Ranger	Process raw sequencing data into analyzable formats
Data Visualization Tools	Tableau, Canva, Genomic Browsers	Create engaging visual representations of complex data ⁶
Programming Languages	Python, R, Bash	Develop custom analyses and automate workflows
Specialized Databases	Indian Biological Data Centre (IBDC), NCBI	Store and retrieve genomic information ⁷
Computational Environments	Jupyter Notebooks, Galaxy Project	Interactive analysis and reproducible research

Most Used Programming Languages

Python 92%

R 78%

Bash/Shell 65%

Tool Usage Distribution

Future Frontiers: Where Do We Go From Here?

The future of bioinformatics and genomics shines with possibilities, many being shaped by Indian-origin scientists in key leadership roles. The field is rapidly evolving toward:

AI Integration

Machine learning algorithms detecting patterns in genomic data that escape human observation ⁹ .

Single-Cell Omics

Examining genetic activity in individual cells rather than bulk tissue ⁹ .

Multi-Omics Integration

Combining genomic data with proteins, metabolites, and environmental factors.

Enhanced Visualization

Virtual reality environments to explore molecular structures in 3D .

As Dr. Todd Treangen of Rice University notes, "ASM BIG recognizes that the microbial sciences are experiencing a data revolution. Our goal is to convene the researchers, clinicians and data experts who are not only generating microbial data, but also transforming it into scientific advances that address global challenges" ² .

A Community Writing the Code of Life

The story of Indian-origin scientists in bioinformatics and genomics is more than a tale of individual achievement—it's about building bridges between cultures, disciplines, and data domains.

Global Impact

From improving crops to advancing medicine

Collaborative Spirit

Building on Khorana's legacy while mentoring new generations

Scientific Innovation

Fusing computational sophistication with biological insight

As we continue to unravel the complexities of life through data, this vibrant scientific community will undoubtedly play an essential role in writing the next chapter of biological discovery—one line of code, one genetic variant, and one breakthrough at a time.