Friday, 12 May 2017

Bioinformatics: The Age of Illumina

Illumina has so dominated Bioinformatics since 2008 that it has started to feel like the default. It seems normal that much of our field is making software diagnostic assays from their short read technology.

Our facility recently got a PacBio, so I started to think about the read lengths and what it means. We had a massive price drop on reads, which moved from 30 to 250 bases in a decade, but that's still relatively short. PacBio reads are expensive but they're so much longer, that it completely resolves problems which could only be solved through statistics and money before.

With short reads we have uncertainty about genotypes - does someone have a different sequence at this location? At the moment we map short reads against a scaffold genome, then use population frequencies to filter to rare variants. This method is entirely based on the relative cost of read lengths.

We also have uncertainty about isoforms: we count reads across exon junctions then perform complicated statistics. Given long enough reads we could just count them.

So how much are we living in the age of Illumina? It's hard to tell when you're so immersed in it. I feel that while much of the difficulty of the field might disappear were read length to get much longer, the core of it would continue to be useful and necessary.

Core Bioinformatics

Once short read problems are resolved, what use are we?

What protein levels are in this cell or patient?
 What is the genotype of this patient or cell? How does it compare with earlier times, or vs healthy people, or people with the same disease?

No comments:

Post a Comment