Genome sequencing for everybody!

As you may know, all of you have a genome. It is made of DNA, which is an instruction set for most of your parts. Sometimes the parts break and tragic disease is the result. Other times, things change and nothing at all seems to happen! What a big mysterious machine the genome is!

Anyway, recent technological advances in sequencing technology have brought the price of an individual genome sequence down to $50,000–not cheap, but nowhere near the 2.7 billion the original Human Genome Project cost. Other biotech companies believe they can deliver a $5,000 genome within the timeframe of a year or so (editor’s note: they’re already late). There’s a genome X-prize also.

What this means is, assuming the price drops to >$5K/person, for roughly the cost of the Iraq and Afghanistan wars, we could have instead sequenced the genomes of all 300 million Americans. This would be an unprecedented data set. If we kept it going for years, we’d learn a hell of a lot about human genetics.

This is assuming the price doesn’t fall even further within the next few years, which I believe will inevitably happen.

My question is–

Would you support a national initiative to collect the genomes of the entire American populace? Assumptions: (1) an individual would not be associable with their genome (2) participation is voluntary (3) you retain a copy of your genomic information (the government does as well, but it cannot link it to you in any way)

Obviously security is part of the issue here. Can we maintain sufficient security to ensure that your genome remains private? Cost is also part of the issue. Using near-future technologies, this will still cost about a trillion dollars. We can maybe ameliorate that by having people buy in or by renting the database to biotech corporations, but it is still going to cost a lot. Is it worth it?

Well, if was given the same legal protections and confidentiality as census data (which admitedly has a few loopholes and abuses to be dealt with and even then will never satisfy the libertarian lunatic fringe), I cant see the harm.

Actually, I’m not sure how useful it would be to have a bunch of fully-anonymized genomes. If they come with medical histories, then you could find things in the data like “This gene has a 0.7 correlation with earlobe cancer”, or the like, and if they had identifying information, you could then track down the folks with that gene but without earlobe cancer, in the hopes of finding out what it is about them that prevents them from getting it. But of course, there are privacy issues with pushing everyone to get their non-anonymized genomes put in a database.

What I do think would be reasonable would be a government program for opt-in voluntary genomes. Anyone who wanted could get their genome sequenced at government expense, and get to keep a copy of it for themselves or their doctors (which may have benefits for medical treatment for them). The government agency would combine this with the patient’s medical history, and put it into a publicly-available database, identified by a secret ID number. It might be possible to retrieve the name and contact information associated with an ID number, but only under appropriate procedures (what exactly those procedures would be, I don’t know, but I would expect it to be only allowed for situations of impact to the health of the individual-- Certainly, it wouldn’t just be available to, say, an insurance company for the asking). Since it’s strictly an opt-in program, there wouldn’t be any concerns about privacy, but I imagine that the value of a sequenced genome would incentivize enough people that you’d get a huge dataset to work with. And the value of the medical results that came out of the research would probably be enough to justify the governmental support.

Yeah, in my head I wanted there to be some medical information associated with the genomic data, but I didn’t say it explicitly.

The thing about anonymized data is presumably you might have a larger, slightly more unbiased pool to work with. Whereas your approach might bias the sample in the direction of (1) the poor and (2) nerds who are (A) trusting of the government.

I agree that there are all kinds of safeguards that would have to be put in place, but to have patient histories in addition to the genomic sequence would be the ultimate goal.

What about costs? Would any of our conservatives jump on the biological equivalent of the Apollo program for a cool trillion dollars of taxpayer money?

I don’t think it would bias towards the poor, since the incentive would be non-monetary. In fact, it might even bias rich, since low-cost medical services are less likely to be able to take advantage of the information in a genome. And it’d certainly bias towards technophiles and those trusting of the government, but I’m not sure those are relevant enough factors that the bias would be a problem.

Certainly, you’d want there to be some way for information to continue flowing from the researchers to the genome donors: If I’m a cancer researcher who finds a gene that increases risk for cancer, I’m going to want to let everyone with that gene know they have it. Perhaps the participants could choose one of several levels of anonymity, so your name only goes in the secret database if you allow it to be? Or maybe everyone who participates gets a monthly data update on what genes are of interest, which they can run with a search tool on their own computer to scan their genome.

Why do the entire population? A properly chosen sample could get nearly the same knowledge about human genetics for a tiny fraction of the cost. It would seem the only reason to do the entire population would be to provide individual advice to everyone which is not the goal here.

So have a sample of 5,000 people or so who have volunteered with the appropriate protections for privacy. Their medical history would be correlated with their genome and you could also have some followup analysis to check their subsequent medical developments.There would be some self-selection bias but there are techniques for dealing with that. Assuming 50,000 per person that would be 250 million which is peanuts for the US and would be a good use of stimulus money. Add in a few hundred million for administration and analysis and it’s still pretty small. In fact I wouldn’t be surprised if something like this isn’t being done right now.

As a researcher, it is not my primary goal (my primary goal is more data). But there is an ancillary benefit to public health in giving everyone their genomic information, and I think it makes it easier to sell. It says: here, you get something in return for giving us this data. It is a tool that may assist you in being healthy. I think it’s a fair trade.

Also, there are some genetic disorders with incidences of less than 1 in 5,000. A really big sample size has a lot of advantages in genetic research.

Do you need the entire population though? Why wouldn’t 1 million people selected suffice? With exit polling, you can get a rough approximation of how people voted based on a smaller sampling. Genomics should hopefully be similar in scope with 1 million well selected people offering a variety of benefits to genomic research since they will represent humanity as a whole fairly well (or at least the US population as a whole).

With genomes at $5,000, I would support a public plan which used subsidies to help people invest their own money (maybe a $2,000 tax credit) for individuals who wanted to get their own genomes transcribed and used both for personal and medical research (knowing their public genome wouldn’t be traced to them).

But at $5,000 a person that is 1.5 trillion for the entire nation if it was funded solely via public funds. A program that used a $2,000 subsidy on 1 million people would only cost 2 billion and would still dramatically advance genomic research.

Absolutely. It would bring your private-insurance-based healthcare system to its knees, since affordable insurance would be denied to anyone with a predisposition in their genes as a “pre-existing condition”. It would also have an enomous effect on your absurdly high murder rates, since crime-scene DNA would be instantly matched with, at the very least, the murderer’s immediate family (in fact, you only need a few million on the database to provide this).

In the UK the benefits would be less obvious, but what the heck - sign me up.

This is something I thought about a lot. I’m not sure how I feel about using the database to solve crimes. It would inevitably be proposed by some legislator. It would probably also be super effective in deterring/solving crimes. But I worry that there would be a chilling effect on participation in the study if people knew the government could use it to link them to a crime. I don’t even know if such a project would be consistent with the current regulations on collecting evidence…

I kind of like this idea the most–it gives people the tools necessary to learn about their own bodies and take charge of their own healthcare to some degree. I prefer it to the government emailing me every month and saying “OMG YOU HAVE THE CANCER GENE!”

I’m not sure I would support this, because of privacy concerns. Is it really possible to have an anonymous database and keep track that you aren’t duplicating or skipping information? Suppose you have a gene that puts you at risk for a debilitating or fatal disease, and genome information associated with your name is mistakenly made available to the public. Do you trust that your insurance company (or your employer) will not misuse the data? It’s hard to think of a way to unring the bell on this one.

Also, if you want to associate medical information with each genome, what medical information would be included? The entire medical history without the patient’s name and SSN? If you include things like age and location, you start getting into the realm where it’s possible that the “anonymous” medical information would in some cases be enough to identify the patient.

I would oppose the project, consistent with my general libertarian leanings and concern for privacy.

How about we start small?

Tiny American flags for everybody!

I believe so. I’ve seen a lot of epidemiological data sets, and they have unique patient IDs associated with them instead of names. It’s done all the time. There can always be errors in entry or maintenance of course, but we already handle data this way.

Yeah, it’s a good point. I envision using birth location and birth year as well as a brief medical history (the kind you get every time you go to the doctor).

A piece of information that would be extremely helpful from a scientific standpoint but possibly infeasible would be to have pedigree data. That is, subject 0003 was the son of subjects 0001 and 0002. But that’s data you can’t really get unless you follow up with people, thereby demolishing the anonymity.

Abortions for some! Twirling towards freedom!

Overall, your proposal is very interesting but I’m still worried about privacy issues. Regarding the database and patient ID numbers, it seems like it would be harder to create an anonymous database and also have a mechanism by which individuals can be sent a copy of their individual genome information if they want it. It seems like there has to be a stage at which the government, or its contractor, has the genome information connected with the patient’s name. Wouldn’t the govt need to keep a record of what genome information was sent to each patient, in case the recipient believed that information was sent to them in error (i.e., the person receives their profile and realizes that the medical history is not a match for them). I’m not being argumentative, and I know next to nothing about databases. I’m just trying to figure out what is possible.

I agree that associating the genomes with pedigree information would be very valuable from a scientific standpoint. So would a lot of other things, e.g., associating with field of employment could provide information about chronic exposure to chemicals or radiation, exposure to chronic stress, etc.

I think there may be other problems, too. I’m about to get into the realm of science fiction here, but what if the database makes it possible to identify a gene mutation that is associated with violent criminal behavior in 20% of carriers? Would you want to know that you have a gene mutation that causes highly aggressive breast cancer, or an uncurable early onset dementia, in 20% of carriers?

Such is the ethical world of genomics. Huntington’s chorea is an incurable but fatal disease that is absolutely dependent on a CAG repeat region on chromosome 4. The length of the repeat (i.e. CAGCAGCAG) varies from person to person and that length determines with some precision the onset of the disorder.

Everyone can choose what they want to know. We test for HIV, because if you know you can behave in accordance with that information to reduce harm to yourself and others. Why not test for Huntingtons?

Why bother with a government program? This is one case where the private sector is handling it quite admirably. Creating a bloated Federal program that is completely unecessary to handle a job that’s being handled without its interference is stupid.

Two reasons. One, it ensures the data is publicly available for free. This is a huge deal for academic researchers, who can’t always afford to throw a lot of money at another biotech company. Two, it puts a mechanism in place whereby people who might not be able to afford the cost of sequencing ($5,000 is too much money for most people to drop on this sort of thing) could still be included in the data. That is, it incentivizes genome sequencing to the point where pretty much everyone could and would do it. I think it would be a better data set as a result.

I suppose it’s conceivable that a biotech company could collect genomic sequence from a lot of people and turn around and sell that data set to researchers, but I think most researchers are better served by a government intervention in this case.