DNA is the information molecule of life. "Interestingly, the DNA of any two individuals on Earth is actually 99.9% alike," says Dr. Phil Reilly, CEO of Interleukin Genetics. "There are 3.1 billion base pairs in the human genome," continues Dr. Reilly, "yet a mere 0.1% of 6.2 billion is 6.2 million differences, which explains our remarkable distinctions." The study of those variations leads to the discovery of genetic predisposition to various diseases, and hence the great interest in mapping the human genome and studying it intimately. "Suppose you and I compare our Gene A," adds Dr. Reilly, "and we find that two or three places out of 30,000 bits [for a gene] are slightly different. Perhaps you have a greater incidence of heart disease, and it is related to those bit differences affecting the functionality of a protein that is coded for that gene and now has a different efficiency of some metabolic pathway downstream." Through extremely computer-laborious number crunching and cross-analysis, key associations are learned that in turn lead to new drug therapies or treatment protocols, only possible through the power intrinsic in the data and the ability to analyze it.
Sequencing of the human genome cost approximately $3 billion and took nearly 15 years. The amount of data this project produced is staggering. There are approximately 25,000 genes hidden among the 6 billion bases to express the "generic" human genome. The first step was to acquire the data. But, that process provides only a generic set of data for a species (useful, to be sure, for detailed analysis of individual genes and events). However, genetics researchers now project that within as few as 5 to 10 years, they will be able to map an individual's personal genome and do so at a cost of as little as a $1,000 per individual. What is the value of handing someone $1,000 to map your own personal genome? Some medical researchers believe that knowing your personal genomecarried with you, perhaps on your PDA or ID cardwill allow your doctor to prescribe customized medication regimens to more effectively treat specific diseases. "There is a particular drug for lung cancer, Iressa, that for most of the people taking the drug it is of little or no value," Dr. Reilly explains. "But, for a small number who have a particular gene variant, the drug is of great value for some reasons of protein interaction that we don't fully understand. If we could merely identify ahead of time which people would react perfectly to this drug, we have achieved a major advance in medicine. We could now choose a drug that matches your particular genome and we enter an age of personalized medicine." Reaching this conclusion was only possible through massive information sifting. "We should soon be able to screen for gene variants which will signify risk for many more disorders. This would allow early interventions to avoid becoming ill at all," exclaims Dr. Reilly. Dr. Reilly believes that with fast mapping and identification of individual genomes, people could potentially avoid contracting thousands of diseases and that this could be done at a very early stage in life, but we need many more genome mappings and analysis (without waiting 15 years and spending another $3 billion). Today, individual computers available for use in clinical settings do not exist that can process massive quantities of data in any reasonable amount of time. Your laptop, for example, has less than half a terabyte of disk space, and it would take your computer three hours just to read and digest every byte of your 3.1 billion base pairs before it could even begin to process it. To exploit this gene-centric therapy opportunity, the computer world will have to embrace many changes. One such discussed change is the move to "federated computing," where a great many computers work together on very small parts of a large problem and then make the results generally available. In the Inescapable Data world, the value of information is in its sharing. Fortunately, there is a slow but steady movement in the medical field to allow data to be shared as information (in self-describing XML or other techniques). As a citizen concerned about your own longevity, would you be willing to make available your genomein its totality or in piecesfor analysis by a large faculty of university-based researchers, or even a young but promising college level pre-med student with a theory on the best way to quantify prostate cancer risk? Would you make available a fraction of your computer's CPU time and resources as a participant in a global federation of computers dedicated to finding genetically linked cures for certain diseases? In the world of Inescapable Data, both are distinct possibilities. Simply restated, our Inescapable Data view is that, while analyzing relationships among data elements themselves is an interesting and useful endeavor, it is the analyzing of relationships between disparate data sets that nets new and possibly more significant values. Suppose, for example, you correlate the electronic records of your visits to the supermarket with your genome and with the genomes, diets, and disease histories of 2 billion other people. You might discover some hidden dangers lurking on those supermarket shelves that could be avoided. What then happens when you correlate those results with the frequency of your visits to the health club? (Okay, we already know that answer.) Impossible? Right now, today, yes, it is not possible due to our lack of total electronic record keeping. But the data is starting to be gathered today, and the connections are being made today that will allow such analyses to be made in the near future.
The famous Framingham Heart Study took 20 years before researchers who studied and gathered data from thousands of individuals produced their findings. That data is now available in some electronic form for other researches to exploit and attempt to correlate with other data streams. Such studies are taking place continuously in the medical world, and increasingly their details and results are increasingly being made available for wider exploitation. Web connectivity and standard electronic records for describing the data (such as XML-expressed information) allow for more rapid integration into other research. What once would have taken a team of programmers weeks to decode via a proprietary database can now be done nearly instantly and even by average citizens (much like average citizens can look up real estate values in their town or search for a blender at WalMart.com). Furthermore, just finding the interesting databases is dramatically simpler due to search tools and a new emphasis on sharing within the medical communities. In the world of Inescapable Data, it is highly likely that the data needed to do meaningful medical research already exists somewhere. As a researcher or even as an individual, you can tap into it and make correlations. |