Microsoft proudly announced this week that SQL Server 2017 will be the first relational database engine to ship with built-in artificial intelligence (AI) capabilities.
In the future, database servers running its software won’t just be able to store, query, and analyse data. Instead, they’ll also be able to perform machine learning, natural language, and neural network processing - the foundations of modern AI. To make this happen, Microsoft has built support for the Python programming language into its database engine. But why Python? What is Python? And how does this compare to the R integration in SQL Server 2016?
What is Python?
Python is a general-purpose programming language, named after Monty Python. It favours code readability over formality, can be extended to do almost anything using libraries of additional functionality, and is popular with a very broad range of audiences. In case it risks sounding too lightweight to be powerful, history says otherwise - the Reddit.com web site is written using it and Google created a Python library that scales data science calculations to concurrently use thousands of processors. What makes it stand out from the crowd right now is the quantity and quality of AI related libraries for it. Very complex Artificial Intelligence functionality, such as machine learning, is available in relatively easy to consume add-ons, such as those listed here.
Why integrate Python into SQL Server?
One of the most likely reasons why Microsoft has now integrated Python into SQL Server is because it’s increasingly an application developer’s preferred language for performing AI operations. It’s broad language capabilities make it much easier for them to embed data analytics, machine learning and AI functionality into an existing application or database’s code – and then do something with the output, such as reformat it for further use.
This is a significant difference compared to other analytical languages, such as R, which are primarily used by data scientists to discover and analyse data in their own time, rather than perform near-real-time operations within an application. That’s not to say they couldn’t do that because they can, but it doesn’t change the fact that they’re more suited to being used for offline analysis rather than playing a part in application activity.
If anything sums the difference up best, then it’s a saying that says the more you use the R language, the more of a PhD data scientist you are, while the more Python you use, the more of an engineer or developer you are. Similar capabilities yet different audiences, which is why Microsoft is now offering access to both.
How has Microsoft integrated it?
Microsoft has used the same integration approach for Python with SQL Server as it did for R. In a nutshell, that means the Python computation processes run outside of the database engine, but are connected to it using a very fast, secure and proprietary in-memory communications channel. This channel is used to send data from the database to the Python processes and return the results back to SQL Server. The Python code users want to execute, just like R code, is passed as a parameter to a T-SQL stored procedure. This means any application that can execute T-SQL can now also execute Python.
Finally, just like it did with R, Microsoft has provided a series of RevoScalePy packages which allow Python operations to use parallel processing and data sets without size limits.
Bringing the compute to the data
Despite historic reservations, bringing the compute to the data is the new model for enterprise scale data analytics, machine learning and AI. Processor power in most database servers is now plentiful compared to just a few years ago, while the security and performance implications of sending data in and out of a database server are now far more problematic.
Another benefit is that this model allows all of the data in a database to be used as an AI model’s training data, whereas in a data sent to the remote compute model network bandwidth and remote compute resources often limited the amount, and therefore the accuracy, of the training data analysed.
How soon?
Bringing Python capabilities to SQL Server is almost certainly a long term investment for Microsoft, just like integrating R was. It’ll take time for Python developers to consider how they can use a SQL Server platform while it’ll take time for SQL Server developers to consider how they can use Python. But neither are going away and innovation trends show us AI is being used in more and more everyday situations. It’s worth then remembering the capability is there.
Further reading
This first link talks about why and then how to use Python for AI and is useful if you’re new to either or both - http://letzgro.net/blog/creating-ai-using-python/
This second link compares Python to R, not to help you pick one or the other but to help you appreciate their capabilities and how they’re intended to be used - https://elitedatascience.com/r-vs-python-for-data-science