Working With Data in SQL Server and Python 3
Here are steps you can follow to begin working with Microsoft SQL Server databases using Python.
Installing SQL Server Developer Edition
I’ll begin with installing SQL Server Developer Edition. You can get it by downloading it from this page: https://www.microsoft.com/en-us/sql-server/sql-server-downloads. Choose the Custom installation type.
This will bring you to the SQL Server Installation Center. You’ll see dozens of options to plan, install and maintain SQL Server. I chose the option to install a New SQL Server stand-alone installation, and later selected the Developer edition. Developer is functionally equivalent to the Enterprise edition SQL Server, but is limited to non-production application connectivity only.
You’ll come to a page that asks you to choose the features to install. I chose to install SQL Server’s Machine Learning Services with the Python language. I also want to explore configuring the Machine Learning Server with Python, so I included that as well. Keep in mind that this isn’t a sane choice for a production system. The Machine Learning Server will compete for database resources and reduce general performance.
Choose a name for your SQL Server instance. You’ll notice that I already have a SQL Server Express instance installed. The installer will proceed to install the new instance side-by-side with any existing instances.
You’ll need to configure the Windows services that run in the background. I chose automatic startup types so I don’t have to remember to turn SQL Server on manually.
On the Server Configuration tab, I chose Mixed Mode authentication for greater flexibility during development. I can access SQL Server through a Trusted Connection (Windows Authentication) or standard username & password credentials depending on the situation.
In a production environment I would tend to prefer using a Trusted Connection, while SQL Server authentication is good enough to begin with during development.
The Python installation step is next. SQL Server will do this for you automatically. Don’t worry if you already have an existing installation of Python. You don’t have to take extra steps to uninstall (or protect) other versions of Python installed on your system. Anaconda will compartmentalize itself in a distant subfolder to avoid interference.
The installation may take some time. While you wait, if you’d like an in-depth guide for developing applications on SQL Server, take a look at Murach’s SQL Server 2016 for Developers.
Remember to install the SQL Server Management Tools from the Installation Center (pictured above)!
Once everything is installed, we’re ready to move to the code!
Talk SQL to me (with pyodbc)
Next let’s focus on basic Python script connectivity to the SQL Server instance. We’ll achieve this by accessing the database through an ODBC interface. Pull down the ODBC library, pyodbc. (Note: while we could instead use the pymssql library, Microsoft officially sanctioned pyodbc as the preferred connectivity module. That’s good enough for me!)
you@yourterm: pip install pyodbc
Create a small script to execute the following code. In it you’ll import the library and connect to the database using the default sa account and the password you set. You’ll also need to initalize a pyodbc cursor to wade through the data.
import pyodbc
db = pyodbc.connect(driver='{SQL Server Native Client 11.0}',
server='localhost\dev',
database='',
uid='sa',
pwd='Your-Pa$$word')
cursor = db.cursor()
cursor.execute('select @@version;')
for row in cursor:
print(f"rows: {row}")
Note that the database parameter is blank. We’ll fix that later. Your output should look something like this:
rows: ('Microsoft SQL Server 2017 (RTM-GDR) (KB4293803) - 14.0.2002.14 (X64) \n\tJul 21 2018 07:47:45 \n\tCopyright (C) 2017 Microsoft Corporation\n\tDeveloper Edition (64-bit) on Windows 10 Home 10.0 (Build 17134: )\n', )
Nice. Now we’ll want to create a new database. But to do that, you’ll first need to enable pyodbc’s autocommit feature. Without it you’ll get an ugly error, among other things. We’ll need a way to verify the existence of the database, so let’s run a simple stored procedure against the database and check the results.
db = pyodb.connect()
db.autocommit = True
# ...
cursor.execute('CREATE DATABASE MySampleDatabse;')
cursor.execute('EXEC sp_helpdb MySampleDatabase;')
for row in cursor:
print(f"rows: {row}")
cursor.execute('''USE MySampleDatabase;
CREATE TABLE Locations
(Id INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
Name NVARCHAR(50) NOT NULL,
UpdatedAt DATE DEFAULT GETDATE() NOT NULL);''')
cursor.execute('''USE MySampleDatabase;
INSERT INTO Locations (Name)
VALUES ('Colorado');''')
The output should show some basic details about the database. Great!
rows: ('MySampleDatabase', '16.00 MB', 'sa', 6, 'Aug 18 2018', 'Status=ONLINE, Updateability=READ_WRITE, UserAccess=MULTI_USER, Recovery=FULL, Version=869, Collation=SQL_Latin1_General_CP1_CI_AS, SQLSortOrder=52, IsAutoCreateStatistics, IsAutoUpdateStatistics, IsFullTextEnabled', 140)
Take a look over the pyodbc docs for more information. On the other hand, is all the Python code starting to feel a little overwhelming? If so, check out The Illustrated Guide to Python 3 for a quality primer on the language.
We’ll refrain from printing the results of the CREATE or INSERT operations. If we try, pyodbc will complain with a ProgrammingError that informs us the previous SQL was not a query, and there were no results.
Of course, we do want to see the results the INSERT to check if everything went well. Obviously we could just continue to execute a SELECT statement against the table and print the results — but we’d get a little more flexibility and power by introducing pandas to the picture.
pandas DataFrames
Typically when working with SQL data in Python, you’ll want to explore that data quickly. The pyodbc library and syntax is useful for some operations, but cumbersome for something as serious as data exploration.
To do that, we’ll want a pandas DataFrame. Let’s begin by installing the pandas module.
you@yourterm: pip install pandas
We’ll create a DataFrame by running a SELECT SQL in the read_sql_query method. It accepts our db connection as a parameter. We need to specify the desired database in the connection string as well.
import pandas as pd
db = pyodb.connect(
# ...
database='MySampleDatabase'
# ...
)
#...
sql = 'SELECT * FROM Locations;'
df = pd.read_sql_query("SELECT * FROM Locations;", db)
print(df.head(1), end='\n\n')
print(df['Name'])
Which produces the results we desire:
Id Name UpdatedAt 0 Colorado 2018-08-18
Thorough instructions on pandas can be found in Python Data Science Handbook.
Okay, so clearly pandas and pyodbc play together nicely. But it turns out they’re not exactly soulmates. In fact, the pandas documentation instead recommends the use of the excellent SQLAlchemy database library. Let’s look at that next.
you@yourterm: pip install sqlalchemy
We’ll do ourselves a favor and refresh the code while we switch over to the new library. Create an SQLAlchemcy Engine in the MSSQL dialect against a database URL in the “dialect+driver://username:password@host:port/database” format.
from sqlalchemy import create_engine
SERVER = 'localhost\\DEV'
DATABASE = 'MySampleDatabase'
DRIVER = 'SQL Server Native Client 11.0'
USERNAME = 'sa'
PASSWORD = 'Y0urP@s$w0rd'
DATABASE_URL = f'mssql://{USERNAME}:{PASSWORD}@{SERVER}/{DATABASE}?driver={DRIVER}'
engine = create_engine(DATABASE_URL)
connection = engine.connect()
df = pd.read_sql_query("SELECT * FROM Locations;", connection)
print(df.head(1), end='\n\n')
print(df['Name'])
connection.close()
And we should see the same output from the Locations table as seen above. I encourage you to check out the SQLAlchemy docs on basic usage of the library. A comprehensive introduction is also available in Essential SQLAlchemy by Jason Myers and Rick Copeland.
With SQLAlchemy you can easily define classes to work with your data in models (e.g. the M in MVC) as well. This will make it easier to integrate your data functionality into a web app (Flask, yo) in the future.
So far I’ve demonstrated various ways an external Python script can talk with SQL Server to send and receive data. But this is only part of the story. In the next section, we’ll explore how you can leverage Python scripts from within SQL Server.
The call is coming from inside the database!
While Postgres has featured Python interop for some time now, SQL Server has only recently embraced the language.
The way it works is pretty straightforward: you’ll execute a stored procedure that makes an external call to the Anacondas Python language distro the installer deployed.
But before you can execute Python scripts from a T-SQL, you’ll need to do come configuration (and probably some tinkering). First, execute this command in SQL Server Management Studio to enable execution of Python scripts.
-- Start with...
EXEC sp_configure 'external scripts enabled', 1
-- Followed by...
RECONFIGURE WITH OVERRIDE
GO
You should see something like this message.
You can check whether external scripts are enabled executing sp_configure in SSMS. If there’s a 1 present in the “maximum”, “config_value” and “run_value” columns, the operation was a success.
By the way, you’ll need to restart your SQL Server services after running RECONFIGURE.
Unfortunately I found enabling external scripts wasn’t enough to get Python working from inside SQL Server.
After a bit of searching, I discovered the pythonlauncher.config file. In it I found an entry for ENV_PYTHONHOME. It pointed to my system Python, not the Anaconda version installed by SQL Server. Aha!
Deleting the ENV_PYTHONHOME line was enough to get Python working in my environment. Review these troubleshooting steps if you continue to have trouble.
Time for the payoff! Let’s execute a Python script from T-SQL.
EXEC sp_execute_external_script
@language = N'Python',
@script = N'print(2+2)'
Which ought to result in this message if Python is working correctly:
STDOUT message(s) from external script: 4
It’s alive! Let’s try something with a little more pizazz. I’m going to switch over from SSMS to the new preview of Microsoft’s SQL Operations Studio.
EXEC sp_execute_external_script
@language = N'Python',
@script = N'OutputDataSet = InputDataSet',
@input_data_1 = N'SELECT 1 as PythonValue',
@output_data_1_name = N'OutputDataSet'
WITH RESULT SETS ((PythonValue int NOT NULL));
GO
The Results pane correctly shows the PythonValue of 1. If you made it this far, give yourself a pat on the back!
Of course, you can execute longer Python scripts and import libraries exactly the way you’d expect.
EXEC sp_execute_external_script
@language =N'Python',
@script=N'
import pandas as pd
series = pd.Series([1,2,3,4,5,6])
df = pd.DataFrame(series)
df.reset_index(inplace=True)
OutputDataSet = df
'
WITH RESULT SETS (( MyIndex int, SeriesValue int ));
Well, that’s a wrap. I hope you’ve found this blog post useful and learned a little bit about Python-SQL Server interoperability.