Design, Math, and Data

December 6, 2013

This post is cross-posted at O'Reilly Strata.

When you hear someone say, “that is a nice infographic” or “check out this sweet dashboard,” many people infer that they are “well-designed.” Creating accessible (or for the cynical, “pretty”) content is only part of what makes good design powerful. The design process is geared toward solving specific problems. This process has been formalized in many ways (e.g., IDEO’s Human Centered Design, Marc Hassenzahl’s User Experience Design, or Braden Kowitz’s Story-Centered Design), but the basic idea is that you have to explore the breadth of the possible before you can isolate truly innovative ideas. We, at Datascope Analytics, argue that the same is true of designing effective data science tools, dashboards, engines, etc --- in order to design effective dashboards, you must know what is possible.

As founders of Datascope Analytics, we have taken inspiration from Julio Ottino’s Whole Brain Thinking, learned from Stanford’s d.school, and even participated in an externship swap with IDEO to learn how the design process can be adapted to the particular challenges of data science (see interspersed images below).

We’ve tried many things over the last four years and we thought we would take a few moments to share what has worked and what hasn’t in the hopes that others might also share successful techniques for designing as data scientists. In particular, we have found techniques like Iteration, Quick ‘n dirty, Surrogate data, Simplicity, Explainability and Transparency to be particularly helpful for designing with data, which we oftentimes combine into Ideation Workshops with our clients.

Iteration. Because stakeholders are oftentimes unaware of what is possible with data, we have found it invaluable to iterate on concepts with stakeholders to identify specific tools, dashboards, methodologies, data sources, etc that address specific needs. Whether we iterate in a shared document, in a sketch, or in a dashboard is irrelevant. The key is iterating to find a common language and to explore the realm of the possible with our stakeholders.

Quick ‘n dirty. It’s easy to say ‘iterate’ but it’s much harder to do in practice. One approach is to build functional interactive dashboards that illustrate a concept. This is great if you’re a pro and can build something in a few hours, but more often than not a functional dashboard can easily take a day, a week, or more to create. The timescale involved gives people pause when giving feedback --- will I offend this person that they spent time on something that isn’t meeting my expectations? In our experience, a better approach is to sketch out dashboard mockups on a whiteboard (or with marker and paper, if you prefer) to get feedback. Since you can’t incorporate too much detail with a thick-tipped marker and it is obvious you didn’t spend too much time on it, you can illustrate a concept in a matter of minutes and quickly share it to get honest feedback.

Surrogate data. Oftentimes the real data for which we are designing solutions is either (i) very big, (ii) highly confidential, or (iii) ensnared in bureaucracy or legacy systems. All of these things can slow down iteration cycles and make it difficult to make sure the ultimate design is useful. One approach we have come to use very often is the use of surrogate data --- data that is similar in nature to the actual data, but is either synthetically generated or from a freely-available source with similar essential characteristics. Because it is a surrogate, we can put it on our laptops and run all analysis locally without having to worry about the challenges that come with hugeness, confidentiality, or data bureaucracy.

Simplicity. Its easy to create overly complicated dashboards; all it takes is one more gadget or one more metric. Look around; have you ever looked at the engine temperature gauge in a car? When we work with our clients to address real needs, we always strive to do so by dedicating a single dashboard view to a specific use case.

Explainability. How many times have you heard someone say “blah blah autoregressive blah” or “yadda random forest yadda yadda”? These techniques are extremely valuable and insightful, but unless the end user can understand what the fuck is being done in the first place, our fancy algorithms don’t stand a chance of being useful. Case in point is the Netflix Challenge, which ultimately improved their recommender system by 10%, but was so complicated that it couldn’t be maintained.

Transparency. As Kate Crawford has articulated so prominently over the last year, transparency is crucial so that stakeholders can appreciate any shortcomings that might be present in the data. Are the data biased? Are the analyses flawed? As data scientists, it is our obligation to acknowledge these things in an up-front manner so that we can interpret our results with an appropriate amount of skepticism and uncertainty.

We use these techniques throughout our projects to continuously refine our dashboards, algorithms, and data collection to make sure that the end result addresses a real need. In particular, we use these concepts during Ideation Workshops. The challenge for most of our clients is not in identifying problems that need to be solved, but usually in exploring how data can be used to solve them. We’ve tried in-person meetings, video calls, written communication, but nothing comes close to the efficacy of meeting in-person for a 2-3 day workshop.

If you have other ideas for approaches that have worked for you, please drop them in the comments in the cross-posted article. We’re continuously evolving our process and we’d be curious to know what you have learned!