Weaving together data science tools

September 28, 2014

Sketch by Dean Malmgren

We’ve written a lot about the importance of using the design process to identify the right problem to solve. But when we’re asked “what tools do you use?”, we often skirt the question by reinforcing how we choose our tools and data sets based on the problems that we face or by describing the tools we’ve used for specific projects.

Our avoidance is deliberate. At Datascope, we don’t believe you can solve every problem with the same tool every single time. In our experience, we often discover new tools to use or develop a custom tool during the course of a project. But we nonetheless appreciate the essence of the question, which is really “how do you build your solutions?”

To reiterate, we do not have a standardized approach, but one thing we’ve become quite good at over the years is in our ability to weave together various open source packages to craft data-driven solutions. Keeping the bigger picture in mind, we aggregate data from a variety of sources, choose an appropriate method of storage, build a scalable analysis, and design visualizations that meet our client’s use cases.

When the existing tool set does not meet our needs, we don’t force a square peg in a round hole. We either adapt and patch existing tools to meet our needs (e.g., fork an open source project), create a new open source tool (e.g., catcorr.js or textract), or build a new proprietary tool specifically for our clients (e.g., our work for Daegis).

For a beginner, it is really hard to know how you can use all of these tools and, perhaps more importantly, how they all fit together. To help people get started, we created a visualization of the landscape of open source data science tools and recorded a webinar on how all these tools fit together. If you find the visualization useful or would like to contribute a tool that we’ve missed, we hope you’ll consider participating in the ongoing support of the project!

Watch Star Fork