Xbio.ca

Population genetics, ancient DNA, AI, and the code I write to make sense of it all

scroll
01

About

Alaina

Hi! My name is Alaina, and I'm a Canadian data scientist who taught herself molecular and computational biology starting in 2011. I spend a lot of my free time doing bioinformatics—population genetics, ancient DNA, that sort of thing.

This is where I share what I'm working on and the tools I build along the way. The "X" in xbio doesn't mean anything specific. (Exponential biology? The X chromosome? The unknown variable? Pick your favourite.)

I work in AI and health science, but the stuff on this site is just me chasing haplogroups through time because I think it's interesting. And also because I think that I can bring 35+ years of software engineering to make science more reproducible and accessible.

We're a bike-only family living in Toronto, so I'm also very concerned with cycling safety. That's where the Toronto Cycling Data project comes from.

02

Explorations

Write-ups of what I'm working on. Mostly population genetics and ancient DNA, but I also spend a lot of time building and supervising AI coding agents, so occasionally that shows up here too.

I Supervised AI Coding Agents for Hundreds of Hours. Here's What Goes Wrong.

I analysed 1.65 GB of production traces from Claude Code and Cursor to understand how AI coding agents systematically optimize for "done" over "correct." Seven categories of misalignment, a working detection pipeline, and some worrying findings about what monitors can and can't catch.

No, the Jakobsson Paper Doesn't Disprove Out of Africa

A response to viral misinterpretations of recent African origins research. The paper supports African origins; it just refines the geography.

Exploring the Growth of mtDNA Diversity

Can we reliably identify new haplogroups with constant data input? Building a system to discover missing branches in the mitochondrial tree.

Making Haplogroup Callers Work with Ancient DNA

Ancient DNA is a mess—damage, low coverage, missing data. Here's how I think about building classifiers that can handle it.

Haplogroup Classification at Scale

How yallHap works, and what I learned building a Y-chromosome haplogroup caller that doesn't choke on 185K SNPs or ancient DNA.

Building Your Own Bioinformatics Data Library

How I assembled ~30TB of reference genomes, population panels, and ancient DNA samples to learn population genetics by doing.

Detecting Ghost Populations with f-Statistics

How to find ancestry from populations we've never sampled, using patterns of shared genetic drift.

03

Tools

Open source tools for genomics research. First-class support for ancient DNA, but tastes great with modern DNA too.

yallHap

Y-chromosome haplogroup classifier

An ancient-DNA-friendly Y haplogroup caller using the YFull tree with 185K SNPs. Features Bayesian scoring designed specifically for aDNA damage patterns, achieving 99.9% accuracy on modern samples and 90.7% on ancient DNA.

eveHap

Mitochondrial haplogroup classifier

An ancient-DNA-friendly mtDNA haplogroup caller using probabilistic approaches optimized for damaged and low-coverage samples.