Seattle has a rich resource of both data-driven research projects and scientists who enjoy designing tools that mine data.
I was asked to talk about that big data heritage here in Seattle as part of a meeting of the Northwest Science Writers Association.
Consider this blog post a cheat sheet for those of you who did not take notes during my presentation or didn’t attend.
1. Big Data is defined as having volume, velocity and variety. If you put sensors on the ocean floor, as researchers are doing at the University of Washington, you will bring back real-time data by the server loads on temperature, salinity, and even on genetic sequences of the microorganisms there. The data will stream at high volume and velocity and with amazing variety. But finding insights from that fire hose of data is not so easy. There is a shortage of the data scientists who know the best way to parse the flood, according to Professor Ed Lazowska, who commented for my recent story.
Lazowska and a team have pioneered the eScience Institute on campus to nurture data science and provide a meeting place, a sort of water cooler, for the best conversations and exchanges among disciplines.
2. Most of us already understand the retail use of big data, because predictive models of our own buying behavior surround us. Apps that help you choose restaurants, airlines, music and other commodities are using models built on the buying habits of thousands of other consumers. The same desire for prediction is driving medical research now. Researchers in Seattle at the Institute for Systems Biology and Fred Hutchinson Cancer Research Center and Allen Institute for Brain Science are all using algorithms to try to understand vast amounts of data about human disease. One wonderful overview of this new way of seeing human disease was just published in Cell by Eric Topol in April.
Among the things Topol envisions in the very near future: “With the power of sequencing, it is anticipated that the molecular basis for most of the 7,000 known Mendelian diseases will be unraveled in the next few years.”
Many human diseases are a complicated mixture of vulnerabilities combined with environment and behavior. Knowing their molecular basis does not mean curing them is easy. But this level of understanding will create new opportunities.
One of the foremost Seattle scientists in genomics is Jay Shendure. You can read why NIH Director Francis Collins praises his newest ambition.
3. For my last comment on this quick overview about Seattle, I want to change the focus of the big data from sensors and molecules to the “big” pool of people that are increasingly seen as a resource for research itself. Patients are key players in new efforts to accelerate medical research by drawing in volunteer patients who free data about themselves. One of the pioneers of this approach is Sage Bionetworks, lead by Stephen Friend and John Wilbanks. I wrote an earlier post about Friend, when the White House gave him an award. Recently, Forbes wrote about the way drawing on the public may bring faster cures.
Seattle sits at the center of many different strengths in using big data. We have leaders in a variety of sciences, including oceanography and proteomics, but we also have leaders in the creative destruction of the old models for disease discovery. As science journalists, I think you can mine many data projects for stories.
Photo above is of poster session at the eScience Institute at the University of Washington.