Who wouldn’t want to be a data scientist, the latest glamor job of the Nerd World? GigaOm told us how to find one or be one. Wired said you don’t need a PhD in Mathematics to be one. You can even be an amateur data scientist, and Smart Data Collective has picked the sample projects to start with. Unable to resist any longer, I signed up to spend a month of immersion in data, hoping to emerge a newly minted data scientist.
I went to Turin, Italy to participate in the first Big Dive, which bills itself as “a training program to boost the technical skills needed to dive into the big data universe.” It would cover development, visualization, and data science.
Our deep dive into big data took place at the Casa del Pingone, a 15th-century house decorated with medieval frescos and equipped with the all-important espresso machine.
By the last day of the course, Fariba, my gelato-loving Iranian physicist project partner, and I had nothing for our final presentation the next morning. Our analysis of a week’s worth of Twitter data had led us only up sundry blind alleys. The sight of our classmates purposefully completing their presentations felt intolerable. I had fled to the park where I was eating pizza and seriously considering not going back. The pizza wasn’t helping; I was a failed data scientist.
The Ice Breaker
My qualifications as a wannabe data scientist were limited to a background in Machine Learning from the previous millennium, attendance at one Strata Conference, and the publication of a few articles on big data applications. I wasn’t sure I was ready for the challenge, but since even data scientists themselves couldn’t seem to agree on what a data scientist did, how hard could it be?
“The biggest blocker (for scientists) is code,” says Jake Klamka, founder of the Insight Data Science Fellows Program, an intensive six-week postdoctoral training fellowship in data science. “Going from coding in a scientific context to development at the level that technology companies do it, learning computer science fundamentals and software engineering best practices. You can’t walk into an interview at Facebook or Square and say ‘I just do MATLAB.’”
For software developers there are different challenges. “Engineers think about building things,” Klamka explains. “Data science is about asking the right questions. That’s what scientists are phenomenally good at,” he says. On the other hand, neither the scientists nor the software developers knew anything about visualization.
We were instructed to first acquire and parse the data, then filter out information which was not of interest, mine to find patterns, and finally to capture significant patterns in a visual representation. We would be using visualization as a means to explore the data, to sketch using code, and to present the final result.
“Data is invisible,” said our lavishly mustachioed visualization instructor, Giorgio. “To create a map between visual elements and abstract information, we need a visual language which has the same complexity and capabilities as a spoken language.” …