AnalyticOps: Part 2 - What is an Analytic Anyway? cont.

AnalyticOps-analytics

or Is an “analytic” just a fancy term for a “business rule”? 

On Monday, I outlined my view of what makes “an analytic” and the jargon that goes along with it. I ended the post wondering whether “an analytic” is just another name for “business rules” that organizations tend to follow today. In my view, the main difference between the two, which might be no difference at all for a given situation, is that "analytics" are generally more complex mathematically and operate on a more general "feature space". This generalization allows rigorously developed techniques from statistics and applied mathematics to have a chance at being applied to a messy real world problem. 

Business rules tend to be more human understandable and more directly embedded into the specific data formats or information processing software used by a business. Not all "business rules" make sense as "analytics", but many "analytics" map into useful "business rules" as "actionable insights". 

Notice that in this definition "analytics" or "an analytic" is a fairly abstract concept similar to "information" and might need to be managed as an abstraction in the way that "information" can be abstractly managed by "a database". For some of you youngsters, you might not realize that there was a time not too long ago when "databases" didn't exist. Computer people just mixed their "information" or "data" with the programs that they were writing, and programming languages had different degrees of support for managing various types of data for specific types of problems. Languages like Fortran, RPG, COBOL/SNOBOL, PL/I all pre-dated the concept of a "relational database" and had different ways of managing "information" or "data" in programs written for them.  

Eventually the “relational database” emerged as an extremely useful abstraction to separate information from the programs that processed information. Quite quickly they became the dominant abstraction. Interfaces to relational databases became  commonplace throughout the information processing world via the now well-known standard called the Structured Query Language (SQL).

We face a similar situation today where "analytics" are usually woven into general programming languages like Python or Scala. To blur boundaries further, special tools like R and Julia are widely adopted, however they are not well integrated with modern data management stacks. Some folks use monolith platforms like SAS to develop, manage, and deploy analytics. That works well if you like SAS but seems to be against prevailing sentiment to disaggregate the stack when possible. Making matters even more complicated, analytics are often intimately combined with distributed data management systems like Apache Hadoop or Spark.

One might ask:  are there standards for describing analytics abstractly in way that is analogous to the way that SQL can be used to describe information and interactions with databases? In fact there are two major ones: the Predictive Model Markup Language (PMML) and the Portable Format for Analytics (PFA). PFA is more modern and sophisticated than PMML, but they both allow for analytics to be abstracted and managed as concrete assets, which are abstracted away from the systems and programs of which they are apart. 

You can find out more about PFA here: http://dmg.org/pfa/

For an example, let’s take a very simple analytic: adding 100 to an input stream of type double:

Now that is not very pretty, and that is by design. PFA is intended to be easily generated and read by computers. Not surprisingly there is a community that is building around PFA and you can even find flavors that are easy for humans to read and write like PrettyPFA (PPFA): 

https://github.com/opendatagroup/hadrian/wiki/PrettyPFA-Reference

Here is the same analytic in PPFA: 

Here is something you can try at home.  Read this: https://en.wikipedia.org/wiki/Trajectory_of_a_projectile

And build our pumpkin chunkin analytic (or series of analytics) in PFA or PrettyPFA! 

Now that we've given a fuzzy, imperfect definition of "an analytic" it is hopefully clearer how their mathematical and more abstract nature differentiates them from business rules. We’ve also discussed some concrete ways and emerging standards to describe them, which are not tied to any specific general purpose computing language, information processing architecture, or analytic tool. In our next post we will discuss managing and deploying these "analytics” as valuable organizational assets. 

Written by Stu Bailey