Data Scientist I/II – Computational Biology

Ochre Bio

2 days ago

Full-time

On-site

Oxford, United Kingdom

Data Scientist

📍 Oxford Science Park, UK (Hybrid – 1 day/week onsite)
📃 Permanent contract
📅 Start Date: As soon as possible

About Ochre Bio

Ochre Bio develops RNA therapies for chronic liver disease. Our driving vision is to end the need for liver transplants. Over 1.5 million people, globally, die of chronic liver disease every year. For the vast majority the only cure is a liver transplant. With little more than 40,000 transplants performed, this is a health lottery. Ochre exists to change this.

Our science is built on three pillars:

Causal human discovery: The largest global collection of human liver data to uncover new therapeutic targets.
Rigorous human validation: World-leading human models that far outclass traditional animal models.
Better therapeutic translation: Optimised RNA chemistry and biology to bring effective therapies to patients faster.

We're ambitious, curious, and supportive, embracing failure as part of innovation and guided by our three operating values:

Clarke's Law: Be bold. Think big.
Murphy's Law: Fail fast. Learn faster.
Wheaton's Law: Support each other, always.

The Role

Most biotech companies talk about being data-driven. At Ochre, we mean it literally.

Our mission to end the need for liver transplants rests on one of the largest collections of human liver data in the world, and we need someone exceptional to help us build, maintain, and scale the infrastructure that makes it useful.

As a Data Scientist in our Computational Biology team, you'll sit at the intersection of biology and engineering, designing production-grade pipelines, structuring complex omics datasets, and ensuring data is accessible and reproducible across the organisation. You'll collaborate closely with experimental and computational scientists, and contribute to analysis that directly drives our drug discovery pipeline forward.

Key Responsibilities

Design, build, and maintain scalable, production-grade cloud-based data pipelines for biological and omics datasets (e.g., RNA-seq, NGS)
Develop and manage data infrastructure (storage, compute, workflows) in AWS using Infrastructure-as-Code tools (e.g., Terraform)
Define and enforce data models, schemas, and metadata standards for complex biological datasets
Implement robust data validation, quality control, and monitoring processes
Optimise data ingestion, transformation, and access patterns to support downstream analysis and modelling
Develop and maintain reproducible, well-tested codebases using software engineering best practices (version control, CI/CD, documentation)
Collaborate with experimental and computational scientists to ensure data is generated, structured, and captured appropriately
Improve data accessibility, discoverability, and governance across teams
Communicate technical solutions and results clearly to both technical and non-technical stakeholders

Must-haves

MSc or PhD in Computational Biology, Bioinformatics, Data Science, Computer Science, or a related quantitative discipline (or equivalent industry experience)
Strong software engineering skills with proficiency in Python (R a plus) and cloud-based architecture (preferably AWS)
Proven experience building and maintaining data pipelines and data infrastructure in a research or production environment
Experience working with large-scale biological datasets, ideally NGS or other omics data
Solid understanding of data modelling, data architecture, and data management best practices
Experience with workflow orchestration tools (e.g., Nextflow, Airflow, Snakemake, or similar)
Experience with Infrastructure-as-Code tools (e.g., Terraform, CloudFormation)
Ability to understand biological context and collaborate effectively with wet-lab and computational scientists

Nice-to-haves

Experience with data lake or warehouse architectures
Familiarity with databases and query languages (e.g., SQL)
Experience implementing CI/CD for data pipelines or scientific software
Experience contributing to cross-functional platform or infrastructure projects
Knowledge of statistical modelling approaches applied to biological data

Apply now

Data Scientist I/II – Computational Biology

More jobs

Data Scientist

Bpm Tech

Senior Data Scientist

Liberty Information Technology