DSC Weekly Digest 27 July 2021
There are fashions in technology that are every bit as ephemeral as fashions in the garment industry. For a while, all data was BIG DATA, then data warehouses were cool, then data lakes became the gotta-have look for the year. Data science had its heyday, and everyone had to stock up on PhDs, then knowledge graphs gained a brief bit of currency, like a particularly frilly collar or gold chains. DevOps was hot and everyone wanted to be a DevOps tech, then machine learning was hot and everyone became a machine learning guru. Yesterday we were arguing about whether R or Python was the next big thing, and today it’s shifted to AutoMLOps vs. AIOps.
Everyone is currently chasing the holy grail of being data-driven companies, often with at best only a very faint idea about what that actually means. Every so often, it is worth stepping off the carousel and letting the brass ring go past,
In general, data can be thought of as records of the events that take place around a person or an organization as they take place Some of this information is a record of the events themselves, such as sales transactions. Some of the data is contextual metadata that puts the events into perspective.
It’s worth noting that some of this data has no relevance to you or your organization, which we refer to as noise, while other data does have relevance, which can be referred to as signal. Unfortunately, there is no explicit guide about what is noise and what is signal until you have a question or query to ask, and typically the biggest problem that most organizations face is that they tend to hold on to transactional data preferentially to metadata, despite the fact that it is frequently the latter that holds the answer to the queries, simply because transactional data is easiest to capture.
Data analytics, at its core, is the art of knowing how to ask the right questions. Not surprisingly, data analytics is stochastic or probabilistic in nature because it is based upon the assumption that people and organizations that act a certain way in the past will continue to do so into the future. This is true, so long as the conditions that applied in the past also continue into the future, and because people’s behaviors have a certain degree of momentum, it is even somewhat true when the conditions change, at least for a little while. However, the future is notoriously fuzzy around inflection points, where events change in dramatic ways, and in those times a good data scientist is worth their Ph.D.-enhanced salaries.
A data-driven organization then is one that both practices good “data hygiene” in the acquisition and preparation of data (typically by attempting to determine semantics or meaning in that data independent of the form of that data) as well as utilizes that data in order to not only read the tea leaves but also to change the behavior of that organization in response to changes in data. Failure to change when the model indicates that change is warranted makes everything else that happens in the data process moot – it is an exercise in adding process without using that process for something positive.
In many respects, the goal of being data-driven, then, is to make the organization become aware in the same way that an animal is aware of its surroundings and can react when those surroundings change, or the way that a seasoned captain aboard a sailing ship can read the sky and know whether to unfurl the sails to catch favorable winds or to furl them to protect the ship from storms.
A data-driven organization is one that is capable of discerning the signal from the noise and acting in response. All else is marketing.
In media res,
Kurt Cagle
Community Editor,
Data Science Central
To subscribe to the DSC Newsletter, go to Data Science Central and become a member today. It’s free!