Skip to content

Prerequisites

Version Compatibility

  • SQL Server versions:
    • Officially declared: 2016 - 2025
    • Actually tested: 2017, 2025
  • Spark versions: 3.2.x - 4.1.x
  • Java versions: 8 - 22

See official documentation and official compatibility matrix.

Installing PySpark

To use MSSQL connector you should have PySpark installed (or injected to sys.path) BEFORE creating the connector instance.

See installation instruction for more details.

Connecting to MSSQL

Connection port

Connection is usually performed to port 1433. Port may differ for different MSSQL instances. Please ask your MSSQL administrator to provide required information.

For named MSSQL instances (instanceName option), port number is optional, and could be omitted.

Connection host

It is possible to connect to MSSQL by using either DNS name of host or it's IP address.

If you're using MSSQL cluster, it is currently possible to connect only to one specific node. Connecting to multiple nodes to perform load balancing, as well as automatic failover to new master/replica are not supported.

Required grants

Ask your MSSQL cluster administrator to set following grants for a user, used for creating a connection:

-- allow creating tables for user
GRANT CREATE TABLE TO username;

-- allow read & write access to specific table
GRANT SELECT, INSERT ON username.mytable TO username;

-- only if if_exists="replace_entire_table" is used:
-- allow dropping/truncating tables in any schema
GRANT ALTER ON username.mytable TO username;
-- allow creating tables for user
GRANT CREATE TABLE TO username;

-- allow managing tables in specific schema, and inserting data to tables
GRANT ALTER, SELECT, INSERT ON SCHEMA::someschema TO username;
-- allow read access to specific table
GRANT SELECT ON someschema.mytable TO username;

More details can be found in official documentation: