Welcome Michael Barton - Director of Sales

We are very excited to announce that Michael Barton joined Open Data Group today as Director of Sales.  With any small company, hiring sales is an important step - it must be timed right with both product market readiness.  Given the last 2 quarters of work on our product, and the reception of that work in the market, it's clear we needed help. 

There are many interesting points of view on when to hire sales, and in what quantity and capacity.  

Bringing Michael on board allows us to scale with the current and future demand for the product, demos and projects we are seeing from the market today.  It's an exciting time at Open Data Group.

FastScore v1.2 - We live to deploy your data science models

It's been a little while since we had a blog post, as we've been heads down on software development.  We've been working on improving FastScore with our partners and customers, as well as the input we are receiving from the market.  It's really exciting to announce FastScore v1.2 today, which marks the 3rd release in 6 months of the product.  The feature velocity is gaining steam, as are the demos.  

v1.2 brings a unified capability to deploy analytics from a variety of model producers including R, Python, and Java.  The release also includes new input/output stream type supports, and some pretty major performance improvement - 3-5x prior versions.  Check the detailed Release Notes for more information.

It's an exciting time in the analytics market - the number of model producers continues to grow, as do the use cases that can leverage data science.  FastScore continues to be data science language and data platform agnostic - if you can build a model, we can deploy it.  

Histograms and High Level Languages at StrangeLoop

This year’s StrangeLoop conference is less than a week away and I’m psyched. This meeting with an odd name lies at the intersection of an odd blend of topics, including distributed systems, languages, and data science. It would be a natural place for me to talk about PFA, which covers all three, but instead I decided to talk about something new: a language of histogram aggregation called Histo·grammar.

Histo·grammar arose from trying to fit together two conflicting philosophies of how to aggregate data. Histograms are the bread and butter of my first field of study, high energy physics, and high energy physics software views histograms as objects to be filled, like lists in LISP or dictionaries in Python. Non-physics analysis software views histograms as the statistical abstractions they technically are, an approximation of a dataset’s distribution. Physics code is infinitely scalable because histograms can forever accumulate data in-place, but it is cumbersome to use in a functional paradigm like Apache Spark. Non-physics histogram APIs are restrictive in how they let you add or access the aggregated data. The key to getting the best of both is to keep the idea of a histogram as a container, but make it a functional container that knows how to fill itself.

To non-physicists, my focus on histograms might seem narrow: after all, isn’t a histogram just one type of chart? According to the statistician’s definition, yes, but the ways physicists have used (abused?) histogram-filling software over the past forty years has led to much, much more. Histo·grammar makes this generality explicit by splitting the histogram into its constituent atoms— composable primitives of data-aggregation that can be used to build a statistician’s histogram and many other aggregate structures.

As datasets get larger in all fields, having a way to summarize them with complex aggregations will be increasingly important. I’ll show how the same declarative language can slice and dice data in HDFS, can be JIT-compiled for blazing speed, and can even be parallelized across vector devices like GPUs.

Around the time I was developing PFA, someone asked me if it was a big transition from particle physics to data science. I said no, because particle physics is the most industrial field in academia and data science is the most academic field in industry. Conferences like StrangeLoop prove this point, in that philosophical musings on some esoteric language can be followed by the next big software stack. If you’ll be there, I’m the guy with the long, scraggly beard (non-unique identifier?) and would love to hear your latest great idea.

A link to an overview of my talk can be found here.

Written by Jim Pivarski