Demystifying Data Science
November 10, 2016
Data science is the buzzword du jour in many circles. While most who are interested in the field have an inkling of its importance, to outsiders it may be difficult to pinpoint exactly what data science entails. Seeing the need for more clarity, Metis Chicago coordinated an event focused on demystifying data science for those intrigued by the field, or perhaps who are interested in breaking into it themselves.
At the event, professional data scientists with different backgrounds came together and spoke about their personal journeys into data science as well as the traits and skills they considered essential to being able to get there. The panel discussion that followed was moderated by Lorena Mesa of Sprout Social and continued to touch on these key concepts. Those included in the panel were Jess Freaner of Datascope Analytics, consultant and author Jeremy Watt, Aaron Foss of LinkedIn, and Greg Reda of Sprout Social.
What Defines a Data Scientist?
To start the conversation, Lorena asked an open question: “What defines a data scientist?”. While many definitions connect data science with the use of complicated code and advanced statistics (which can play a part), Greg started off with a simpler, all-encompassing answer.
“A data scientist is really anyone that uses data to answer questions or solve problems.”
Whether the consideration is directed toward predicting future events or learning from prior ones, one of the biggest roles a data scientist plays is finding answers to questions that occur during the course of business. Jess adds:
“I think it's a lot about defining problems in quantitative ways. Figuring out how to frame it in a measurable way and then figuring out how you can test that solution that you're making and then redefining it over and over again… to make better decisions based on data.”
Jeremy Watt, author of Machine Learning Refined, compared data science to a 21st century empiricism. Similar to empirical knowledge, we don’t have to sift through mounds of information to come to conclusions and have insights after a data scientist has done their work. From their output, we can make reasonable, informed decisions.
A day in the life
The panel touched on how the day of a data scientist can vary widely depending on where they actually end up working. When in external consulting or in-house innovation teams (i.e., effectively in-house consultants), there is more rapid change and varied kinds of projects. Meanwhile, on in-house teams dedicated to products, data teams get to follow the life of projects long-term and continually build on prior work.
In either case, there is a lot of understanding systems and problems, or as Aaron put it, “disentangling what goes on the left and right hand sides of the equation”. This involves figuring out what information you have as input from your client, getting a firm grasp of what the actual end-point is (because this is often fuzzy and unclearly defined at the start), and finding a solution that best brings you to that end-point.
“A lot of stuff that…you just have to do in this field [involves] finding data, cleaning it, interacting with a client and trying to understand exactly what it is they're trying to get out of this [data] that they've just given you. Then, on the coding side, I would say I spend about half my time just pounding my head against the keyboard and the other half Googling why are these errors here, what the hell are these errors?” - Jeremy
Working together + Googling like a pro
As reflected in Jeremy’s sentiment, just because you are a professional data scientist, that doesn’t mean you will always have the answers. If anything, much of data science involves being able to ask the right questions to figure out what to learn when you need to and then being able to learn and adapt quickly.
Fortunately, data scientists don’t have to figure out how to ask these questions all on their own.. What many outside the profession might not realize is the amount of collaboration and teamwork involved in the work.
“Collaboration is really a much bigger part of this than I think I imagined when I was getting into it” – Aaron
Greg added that he tends to think of collaboration from a “complimentary skills standpoint”. “I think it's important to level everyone up by having… people that are stronger in certain areas that can bring everyone to a certain level and be there to lean on one another.”
In short, the answers are out there. It’s a matter of how well you can collaborate with your colleagues, come up with the right queries for Google or StackOverflow, and wrangle everything into a final solution.
Do I need to be a programmer?
Well, yes, but it’s nothing to be afraid of. A data scientist needs to be able to get their hands dirty with the data - whether it be data cleaning, munging, or mining. These tasks, among others, are done programmatically, so coding skills are a must.
However, programming is similar to learning any other language. Don’t expect to be good at it right off the bat, or even after a year of hard work. Jess spoke to this when she said:
“The first time you pick up a different language it’s going to be like kindergarten, or first grade grammar, and then eventually you’re going to be able to write a novel. You don’t start off being able to write a novel.”
Additionally, while talent can help you progress, a significant amount of one’s success is based on attitude and drive to improve and expand upon one’s skills.
“I try to learn something new every day” - Jess
“You don’t need any talent at all to do data science and machine learning… what you have to be is tenacious and focused and trying to think in the simplest possible way all the time.” - Jeremy
How do I get started? - Answer: get started!
“Find something you want to work on, find something you're curious about and then worry about the tooling later.” – Greg
Many people interested in data science aren’t sure where to start to pursue their new career. While education is always critical, as well as the willingness to search for solutions on your own, the first step is to find a project or idea that captures your interest.
There are certainly technical skills required to perform in this field, and these can be obtained in many ways – from self-teaching, to bootcamps like Metis, to formal education. But the resolve to not give up when you are trying to solve a problem is something you have to find within yourself. Find a question or subject about which you are personally passionate, and use that fire to keep you motivated to push harder.
Remain open to use outside sources and input from others when you reach an impasse while focusing on a dataset or question that really pulls you in. Even if your initial progress is slow, that curiosity will keep you interested while you learn.
As one can see, in discussing what it takes to be a data scientist the panelists didn’t talk so much about “hard skills” as they did personality traits and mindsets. Although coding ability and statistical knowledge is necessary to the field, any knowledge gaps can be filled in with diligence. To dive into the field and excel, what one needs more than anything else are curiosity, tenacity, and the drive to continually learn and improve.