AnalyticOps: Part 1 - What is an Analytic Anyway?

AnalyticOps-analytics-PFA

Or is an “analytic” just a fancy term for a “business rule”? 

Over the next few blog posts, I am going to tackle the subject of AnalyticOps, a relatively new function that organizations are going to have to implement and master to ensure maximum ROI on data science investments.  I’ll start with some basics of defining just what is “an analytic” and move through some key elements of implementing the tools (an Analytic Engine) and the competencies that make up AnalyticOps.

To speak of "an analytic" is jargon in the context of modern information processing systems. Perhaps second only to “big data” in its over-use, “analytics” are pervasive on the minds of investors, executives and the layman alike.  So let's define the jargon. I always suggest folks keep in mind that jargon is neither reality nor rigorous, so I like to put jargon in double quotes to point out that I’m not making a definition for you to argue with.  With that said: 

"An analytic" is a "process",  "algorithm", or "technique" which is generally "mathematical" in essence which takes "data" or "information" as an input and outputs "actionable insights".  Whoa that is a definition of jargon terminology that is full of jargon terminology.  As is often the case, we see people using misunderstood words to describe further misunderstood concepts. Let's try a couple of different ways to outline a reasonable definition of "an analytic" by describing some typical properties, including what "an analytic" is NOT so much like, and providing a few possible examples.

First we should describe a few typical properties of "an analytic".  Generally an organization needs "an analytic" when it wants to apply a fairly rigorous and generally mathematical "analytical technique" to a tangible real world problem or set of observations.  Let’s take a concrete example.   What if you wanted to predict "reasonably well" where a pumpkin shot out of a cannon (e.g. a pumpkin chunker https://www.punkinchunkin.com/) was going to land during the national competition in Arkansas?  You’d start by transforming the physical space into X,Y,Z coordinates and apply some high school math to initial conditions and output your guess.  Almost all "analytics" you'll run into in modern information processing follow the same steps of this simple example, so let's break it down a bit.

As an aside, most "analytics" you'll run into have an interesting history full of controversy, treachery, and triumph and were developed over time by groups of smart scientists, mathematicians, and/or data scientists.  Most of us regular Joes will spend our time "deploying" or "applying" analytics rather them discovering, creating, and/or inventing them. If you happen to know or run into the folks responsible for developing the analytics you are using, thank them and tell them how you are using it...you might be surprised how happy they are to hear that their hard work is useful.  Asking them questions like "does everyone agree on your method?" might bring fascination stories.

In the example above, "the analytic" is the well known math from high school physics which takes force, acceleration, mass, and some squared terms and outputs a predicted path.  Even though we have an "an analytic" in our hands for pumpkin impact location predictions, there are still a few steps to "applying" or "deploying" our analytic at the Arkansas championship!  First is the mapping of our real world space and governing forces in that big windy field in Arkansas to the idealized X,Y,Z coordinates, force metrics, etc. that our "analytic" requires to "compute" a guess or "prediction" in a repeatable way that gives us reasonably good confidence.  When "deploying" or "applying" "an analytic", this process of mapping or transforming the tangible problem into the idealized world of the "analytic" is often called feature creation, featurization, feature space transformation, and probably twenty other jargon terms.  The key point is that we are changing our initial tangible inputs from the specific situation to an idealized, usually, numeric "feature space" where the analytic can do its analytic duty in the most general way without getting bogged down in unimportant or prohibitively messy details.  

This "feature space" is often very non intuitive and of high dimension, and therefore disorienting to those of us who did not actually develop the analytic. Our pumpkin example requires a math aptitude of at least geometry (X, Y, Z coordinates) to understand the feature space.  Many modern information processing analytics are in very high dimensions (sometimes millions or more) and require some decent statistical background to truly understand.  So, as you can see, one key property of most analytics is that they are fairly abstract. This disorienting abstractness really becomes of a sticking point between people who develop analytics and real world practitioners who have a job to do!  "I have to chuck this pumpkin! Stop talking to me about vector spaces!" It should also be noted that the process of getting to the more generalized "feature space" from the all the dirty details of the specific problem such that the method is useful, is itself an analytic problem. We'll cover more of that detail in a later installment. For now, we are looking at the big picture and want to simply point out that an analytic "operates" in an idealized, usually numeric, often of high dimension, mathematical space; and when we deploy an analytic we need to deal with this. 

Finally, once "the analytic" predicts a result, in our case the likely location of the pumpkin impact, we need to "transform" the answer back into the real farmland in Arkansas so we can be confident of building a winning chunker.  We have to account for these "input/output" boundaries that do these "transforms".  Whether these "transforms" are considered part of the "analytic“ or not is a matter of taste, debate, and food fights... For our purposes we will abstract that out and say that "feature transforms" are "somewhat different" from "an analytic" or at the very least call them "analytics to be applied to the data earlier or later in the analytic pipeline".

Now we can consider the defining steps to achieve value from “an analytic”:

 "specific inputs" -> "feature transforms" -> "analytics" -> "inverse transforms" -> "actionable outputs".

So how is that different than "business rules"?  Analytics and Business Rules looks strikingly similar in my experience, however, there are core elements that differentiate them and must be managed.  We will cover that in part 2 of this series on Thursday.

Written by Stu Bailey