The Experimental Man

My Genome Via E-mail

Trying to understand the six billion nucleotides—all of my DNA—that just arrived in my in-box.

David Ewing Duncan 09/02/2011

  • 10 Comments

A few weeks back I received my complete genome by e-mail.

Actually, the e-mail provided a link to my raw data, a 690 MB file, a tad too large to send in totem by e-mail.

What I got was endless lines of nucleotides— As, Ts, Cs and Gs—divided up by chromosome in a report prepared by the California-based sequencing company Complete Genomics. They generously ran my genome for free—after considerable cajoling by me—so that I could report on the experience. (The dramatic decrease in price for sequencing whole genomes, from perhaps a million dollars three years ago to about $5,000 today, helped persuade them). The project also was championed by Harvard geneticist George Church and his Personal Genome Project (PGP), which has posted my results and given me the designation PGP 13. I am the 13th person to be sequenced for the project, which is aiming to collect 100,000 genomes. (Other PGPers include tech guru Esther Dyson, Harvard psychologist and author Steven Pinker, and Church).

The mass of code delivered to me holds clues about whether I'm at a higher risk than most people for everything from heart attack and certain cancers to Alzheimer's disease and, more controversially, depression and other behavioral conditions. It contains tips about drugs that may not work for me, or that might inflict dangerous side effects. (Check out my book, Experimental Man, for details about some of these findings from previous testing).

One day, this data will be used in tandem with the stem cell line created for me by Cellular Dynamics International (See my feature article: "Growing Heart Cells Just for Me"). These stem cells—created by bioengineering cells from my blood, which I sent to the company—are similar to those cells that appear a few days after a human egg is fertilized. They can grow into any cell in the body, including the heart, brain, and liver. These cells, guided by the clues in my genome, could be used to refine predictions or diagnoses for diseases,or one day could be used to provide replacement cells should I get whacked in the head or have a heart attack.

But what have I learned from my complete genome that I didn't already know?

If you have been following the Experimental Man Project, you know that I already have results for thousands of genetic markers (genotypes) associated with disease, and with traits that run the gamut from predicting that I have blue eyes (easy to verify) to a higher than normal risk for becoming a heroin addict (I've never actually been interested in the Big H).

These genotypes came from numerous tests, labs, and companies that include the likes of 23andme, Navigenics, Illumina, Affymetrix, the Coriell Institute's Personalized Medicine Collaborative, and Quest Diagnostics. Their tests, however, identified perhaps 2 million genetic markers out of the billions in each of my cells. These were targeted to be among the short list of markers inside a human that seem to be most important in influencing disease and other traits, yet they missed significant portions of my genome that have now been captured by the Complete sequence.

As whole genomes become less expensive and more common, with hundreds of them now sequenced, scientists are discovering that subtle and often rare differences among people may be linked to even common diseases such as heart disease and diabetes. This may explain why many of the genetic markers identified by geneticists for common diseases seem to have a surprising small impact on whether a person actually gets cancer or diabetes, suggesting that as yet unidentified genes and other factors are at work that have not been discovered.

I'm just beginning to sift through my data from Complete Genomics, but I already have discovered one big difference from my previous testing. This is a near doubling of my total genotypes identified (referred to as "annotated"), from around 11,000 before to over 21,000 now. This analysis comes from SNPedia, a wiki-style website that devotes a page each to describe thousands of individual genetic markers. The site's founder and curator, Michael Cariaso, has developed a program called Promethease that anyone with DNA data can use to create a list of genotypes drawn from SNPedia's individual pages.

As of this writing, my total "genotypes annotated" equals 21,621—a number that will go up as more genotypes are identified in the scientific literature.

Here is how SNPedia's Michael Cariaso described my results in an e-mail:

SNPedia is now watching over 100 genomes closely. Your genome now has the most detailed report known. This is due to the combined effects of your Complete Genomics full genome and your microarrays [previous tests], putting your combined at ~22k. With your recent arrival I think you're likely to hold the lead for the rest of this year, and perhaps well beyond.

Two important challenges arise as I begin to analyze my data. One is that tools for interpreting whole genomes remain nascent as companies and labs that have been hell bent on building better and cheaper methods for sequencing begin to turn to the much more Herculean task of understanding what all of this code means. The other is that much of the genetic markers remain preliminary, based on statistical analyses that compare people, say, with heart disease to those who don't have heart disease. Only a tiny percentage of these "Genome Wide Association Studies (GWAS)" have been clinically validated in real people to see if the risk factors indicated by the statisticians actually happen. (GWAS is also becoming a misnomer and needs updating, since these markers don't really come from whole genomes).

This second challenge requires a massive effort akin to the Human Genome Project to systematically validate the tens of thousands of genotypes that have been identified so far by scientists. This task will be greatly aided by the proliferation of whole genomes as the price comes down.

In the end, though, the real question is: has this crush of data changed my life? For that, I'll need to post another blog-or several. So stay tuned.

This is an appeal: Send me you ideas for how best to interpret my newly sequenced complete genome!

Close Comments

To comment, please sign in or register

Forgot my password

Mary Mangan

3 Comments

  • 625 Days Ago
  • 09/02/2011

What I would do with my data...

My plan would be 3 major steps, but this summary leaves a lot of "miracle happens here" pieces still.

1) Assessment and QC
2) Build myself a personal browser
3) Look closer at well-characterized and medically-relevant genes and structural variations

But this would be a big project, and unending as new literature come out daily.

I've talked in a bit more detail at the link below, and I have raised your question among some genomics/bioinformatics geeks at BioStar (linked at the bottom of my post).

http://www.personalgenomics.us/1452/experimental-mans-full-dna-monty/

But I'm really interested to see what other people say.

Reply

Mary Mangan

3 Comments

  • 624 Days Ago
  • 09/03/2011

Re: What I would do with my data...

So I have posed this question at BioStar, a collection of bioinformatics geeks actively working in genomics. Currently the top rated answer there is that they don't want to know.

http://biostar.stackexchange.com/questions/11752/what-would-you-do-with-your-personal-genome-data

But it's a holiday weekend, it may get more active next week.

Reply

zrzzz

84 Comments

  • 624 Days Ago
  • 09/03/2011

Careful!

Your insurance company gets a hold of this and discovers you have a predisposition for cancer or heart disease, or Alzheimer's or some other slow, costly end-of-life scenario, and they'll drop you like a ton of bricks. They already have access to every shred of your medical records, so telling your doctor both parents died of something inheritable is a bad idea too.

Reply

AlanOpp

1 Comment

  • 624 Days Ago
  • 09/03/2011

Re: Careful!

At least in the U.S., GINA (Genetic Information Nondiscrimination Act) makes health insurance or workplace discrimination based on genetic information illegal. It doesn't mean it won't happen, but it's much less likely than it might appear.

Reply

Mary Mangan

3 Comments

  • 624 Days Ago
  • 09/03/2011

Re: Careful!

But it doesn't include life insurance protections. And there's plenty of other types of discrimination that are not workplace or health insurance.

But also note: if your disclose you data to an MD, that can become part of your medical record and then possibly actionable by the insurers.

It also hasn't been tested in the courts to any serious degree.

Reply

eric25001

26 Comments

  • 622 Days Ago
  • 09/05/2011

CNV's

More data is needed in Copy Number Variation (CNV), What additions, deleations, inversions, premature codon stops etcetera go along with illness, smartness, good looks, etcetera etcetera

This needs to be combined with diet and environmental factors going back 100 years to get the parents and grandparents diet. EPIGENETICS anyone.

Please; more data please!

Reply

mbloore

39 Comments

  • 621 Days Ago
  • 09/06/2011

billions?

there are billions of genetic markers in each of your cells?  there are fewer than 30,000 genes.  perhaps you meant billions of base pairs?

Reply

davidewingduncan

13 Comments

  • 621 Days Ago
  • 09/06/2011

Re: billions?

Yes, I did mean to say "billions of nucleotides" not billions of genetic markers - earlier in the story I made it clear that I was talking about nucleotides. However, on your point about there being only 30,000 genes, the number is probably more like 25,000, and genes are different than genetic markers, which are primarily single nucleotide polymorphisms (SNPs) which are single "letter" changes that are different in different people, and appear both inside and outside of genes. The number of SNPs and other markers such as deletions and copy variants currently number in excess of 30,000, and the numbers keep growing. - DED

Reply

Pellionisz

6 Comments

  • 621 Days Ago
  • 09/06/2011

Is IT ready for the Dreaded DNA Data Deluge?

David - you are the best person to know that "This second challenge [full DNA interpretation] requires a massive effort akin to the Human Genome Project". As I established in both a peer-reviewed science paper The Principle of Recursive Genome Function and popularized in YouTube (now at about 12,000 views) the needed Information Theory effort far surpasses the accomplished challenges of "affordable full DNA sequencing" and "Information Technology readiness" to compute whatever one wishes to compute. We can do all the "garbage in - garbage out" endless computing to fry the fastest supercomputers. Yet, will understand very little (worse, the Industrialization of Genomics my crash because of the unsustainable lack of supply-demand of Sequencing and Analytics) till a new theoretical breakthrough is implemented in earnest, "SAMSUNG-style". (BTW, are you ready to ship your genome to Seoul??). The genome is fractal. One would be a fool to try to "pin down" maddening dot-patterns of endless renderings of a single Mandelbrot set, with Z=Z^2+C defining the entire "complexity". ("C" is a constant, change it a little [e.g. to c], and the actual dots will change how they overlap with the pattern when the constant was capital C with somewhat different value.) Does this mean that looking for "structural variants" (already found in millions, and are certainly not limited to single nucleotide polymorphisms) is hopeless? Not at all! The "only" quantum-leap many consider now is to embrace my fractal approach - and sharply distinguish those structural variants that are "parametric" (like C) and thus do not change the pristine fractality from "fractal defects"; focusing on structural variants that violate the genome's own intrinsic mathematics.

Reply

erbium

343 Comments

  • 615 Days Ago
  • 09/12/2011

comparing

your complete sequences with someone else..

Problem is, we are not all 'one sequence' obviously.

While some of the common variations are known, one person was originally sequenced.  11 years on, I'm not sure exactly how many people have been sequenced completely. 

Even this article had no clue.
http://www.nature.com/nature/journal/v464/n7289/full/464649a.html

And for those that have gotten complete sequencing done, how many have the genome available for comparison, or instead are keeping it private so unavailable for science.   Perhaps they could contribute it anonymously.

But unless you become an expert at all the tools mentioned, and there lots of DNA tools, not all oriented towards analyzing human DNA.

each of the pages below lists multiple DNA tools.   and as you mentioned, they may not be the best for the task at hand.  Some cost money.



http://bioinformatics.unc.edu/software/opensource/index.htm

http://molbiol-tools.ca/molecular_biology_freeware.htm

http://jura.wi.mit.edu/bio/dna/

http://www.ebi.ac.uk/Tools/sequence.html

http://users.stlcc.edu/kkiser/DNA.tools.html

http://en.bio-soft.net/dna.html

http://bioweb2.pasteur.fr/nucleic/intro-en.html

http://cellbiol.com/Tools.html

http://www.ncbi.nlm.nih.gov/guide/all/

http://www.bioscience.org/urllists/dnaanal.htm

http://www.jcvi.org/cms/research/software/

http://bioinformatics.unc.edu/software/opensource/index.htm



one of the comments on a linked page mentioned that DNA is not the complete picture. 
http://biostar.stackexchange.com/questions/11752/what-would-you-do-with-your-personal-genome-data

It is like a blueprint but the actual expression could change.

Ultimately, what would be more useful is an incredibly high resolution map of the physical expression of genes, your body.

You could calculate volumes of any organ or vessels, compare blood vessel branching.  See how many organelles are being digested for recycling, see how many blood vessels have plaque and where, even count your neurons and interconnections.

A startrek like highly detailed 'microscan'.  Of course it will never happen, as measuring in this detail would kill you and we don't have the technology yet anyway.  The microtome slice scans of two humans project is a start in this direction.  They were of course dead before being microtomed from head to foot.

I think ultimately the commercial services like 23andme have available the bulk of what is particularly useful from your genome.  I'm not saying it's pointless, sounds like a really neat project but examining the physical expression of your DNA (you) in greater detail than currently now available would be a better helper to assess overall parameters of people's bodies.

Reply

Bio

David Ewing Duncan is a journalist and author, and the Director of the Center for Life Science Policy at UC Berkeley. This blog is a companion to his book, Experimental Man - www.experimentalman.com.

Subscribe to the The Experimental Man RSS Feed

Advertisement
Advertisement
Advertisement