July 25, 2017

Currently there are innumerable data languages that can be used for a wide range of analytic projects, and this amount will surely increase as new languages are being developed.

Why Your Models Are Getting Lost in Translation-1.png

While some languages may seem to overlap in performance and functionality, a few of the most commonly used data languages were structured to fulfill different purposes. As a result of this, data teams go through a discovery phase to determine which language and model format will yield their desired results. Data Scientists focus on building quality models; therefore, they need to use languages that have more advantages to do so. Two of the most common languages to achieve this are Python and R. These languages allow Data Scientists to easily train/test their model while providing a large set of functionalities that make building well-performing models more achievable.

With each language having distinct key capabilities, it is not unusual that different teams within an organization will use separate languages. The languages that Data Scientists use will be converted by IT to be compatible organizationally across multiple business applications.

In order to do this, IT requires frequent and transparent communication within their team, which is why they need languages that are built to sustain larger groups. Due to this requirement, IT generally uses Java or C++, because these languages provide structure and safety when moving a model back and forth between users.

While it is unavoidable and necessary that Data Scientists and IT use different languages to build and deploy a model, there are many challenges that arise when trying to move a model from one team to another. A few of the most prominent obstacles are:

  • Most programming languages do not work together. The biggest issue many Data and IT teams face is getting data from a model written in one language and recoding it to a model that is written in another language. This is a labor intensive and often error-prone task.
  • Models can get convoluted in transition. The minute a model is moved from Data Science to IT, the model is open to mistakes. If the model is not implemented correctly, there’s the risk of changing the format of the model or potentially breaking it.
  • Versioning between languages creates its own issues. Moving a model from one language to another is already a hassle in itself. Having to understand which version of a language is being used and its functions creates more obstacles to work around. 

Over the years, a few tools have been created to ease the process of transferring models from Data Science to IT. Model interchange formats, such as PFA, in particular have been utilized. The user can place the model into the model interchange format, and the format can be used to convert the model from one language to the next.

While these formats have helped, they have issues of their own that programmers have to work around. To begin, writing a converter involves the need for an intimate understanding of how that model works in its native language. The user must know everything about the model’s parameters and how it will run in PFA. Not having a full understanding of this may create the need to start from scratch if something is changed within the model. There’s also the burden of having to deal with another language while trying to work around this problem in the first place.

By now, it may seem as if there’s no easy way to translate a model from one language to the next. However, FastScore, is a simplified and user-friendly process that makes scoring and deploying models easy with no interchange format needed. Data is run from data sources into the engine, the model is put into the engine in its native language, and then an output is outsourced.

FastScore captures the whole environment. It is built to be agnostic, so you can use any tools, languages, or versions, and the model will run in exactly the same way within the engine.  

To learn more about how FastScore can deploy your models no matter what language, click here!

Tagged: Deployment, FastScore, Model Deployment, Open Data Group, PFA, agnostic scoring engine, Data Languages