Introduction to Azure Synapse Analytics Serverless SQL - Serverless SQL (2023)

The newAzure Synapse Analytics SQL Semi-ServerThe service allows you to query file-based data stored in an Azure Data Lake Gen2 storage account using familiar T-SQL. The data remains stored in the source storage account, and CSV, Parquet, and JSON are currently supported file types. The service allows data roles such as data engineers and data analysts to query the storage and also write data back to the storage.

The pricing mechanism is based onAmount of data processedinstead of service levels or uptime. The current cost is£3,727 ($5) per 1 Terabyte (TB) processed.. Please note that these costs are not minimum costs; Therefore, you can run queries that process much less data and are billed accordingly. FOR EXAMPLE. If you run a query that processes 100GB of data, you'll be charged approximately £0.3727. Data processing costs include retrieving data from storage, reading file metadata, transferring data between processing nodes, and returning results to the serverless SQL endpoint. The SQL Serverless service is generally available (GA) as of November 2020.

blog content

script

In this scenario we create:

  • LikeAzure Data Lake Gen2Storage account for storing CSV and Parquet files
  • LikeAzure Synapse Analytics workspacecon motor SQL Serverless
  • 2External SQL tablesto query CSV and Parquet data with T-SQL.
Introduction to Azure Synapse Analytics Serverless SQL - Serverless SQL (1)

Azure Synapse-Analyse

(Video) Intro to Serverless SQL in Azure Synapse by Brian Bønk

requirements and costs

To follow the tutorial, you need access to an Azure subscription with permissions to create resources. There is no cost associated with creating the initial Synapse Analytics workspace; However, there is a storage cost and no SQL Server cost. This blog post costs less than £1 to run the example. For pricing, see the references section of this blog post.

(Video) Azure Synapse Analytics - Serverless SQL Pools

step by step

Now we will go through the steps required to create the Azure resources. EITHERreferencesThe section at the end of this blog post provides links to official documentation that extends the capabilities of this blog post.

Build Azure services

resource group

  • Sign in to the Azure portal athttps://portal.azure.com
  • create a new oneresource group(Search for a term in the search bar) by clickingAdd toin the Resource Group section.
  • Choose the one that suits youinscriptionSave the resource group and select a region. Make sure you select the same region in all subsequent steps.

Creation of Storage Accounts

  • create a new onestorage accountclickAdd toin the Storage Accounts section.
  • Enter the following information in thebasicsab:
    • Inscription:your corresponding signature.
    • Resource group:Select the resource group created in the previous step.
    • Storage account name:Enter a suitable name, e.g. B. storsynapsedemo.
    • Location:Select the same region as the resource group
    • Perfomance:Standard
    • Account Type:StorageV2
    • Reproduce:Zone redundant storage
  • Enter the following information in theRotab:
    • connectivity method: Public terminal (all networks) - suitable for development purposes.
    • Network Routing:Microsoft network routing (default)
  • Enter the following information in theProgressiveab:
    • Blob Storage:Allow public access to blob: Disabled
    • Hierarchical namespace for Data Lake Storage Gen2: capable
  • chooseReview + Createthen select after successful validationCreate.

After creating the new storage account, we need to add a new container and folder structure.

  • Select under the new storage accountcontainerNOData Lake storageSection.
  • choose+ containerto create a new container namedDatensee.
  • Click and select the new container+ Add directoryand enterfatointernetvendasgrandelike the name
  • Create 2 new folders namedCSVmipisoNOfatointernetvendasgrandePasta.

The data itself is based on theFatoInternetVendasData from the Microsoft AdventureWorksDW sample database. Only a CSV (Comma Separated Values) file and a Parquet (Compressed Column Storage Binary File Format) file are availablenot here GitHub. You can make multiple copies of each file and upload them to the appropriate folder in your storage account. In this demo we use 3 copies of each file, however the example files on GitHub are smaller due to file size limitations. The process is the same except for the size of the files.

Introduction to Azure Synapse Analytics Serverless SQL - Serverless SQL (2)

Synapse Analytics workspace

  • Create a new Synapse Analytics workspace by searchingSynapsenanalyseand selectAzure Synapse Analytics (workspace preview)
  • chooseAdd toand enter the following information in thebasicsab:
    • Inscription:your corresponding signature.
    • Resource group:Select the resource group created in the previous step.
    • workspace name:Choose an appropriate name.
    • Region:Select the same region as the resource group.
    • Select the Data Lake Storage Gen2 account name:choosecreate a new oneand enter an appropriate name.
    • File system name:choosecreate a new oneand enteruser.
  • Enter the following information in theSecurityab
    • SQL Manager admin username:Enter an appropriate username for the SQL Administrator account.
    • password: Enter (and confirm) a suitable password.
  • Enter the following information in theRotab:
    • disable thoseAllow connections from all IP addressesselection box
  • chooseReview + Createthen select after successful validationCreate.

When the Synapse Analytics workspace is created, a firewall rule needs to be added to ensure it can connect to the service.

  • In the new section, select Synapse Analytics Workspace in the Azure portalFirewallunderSecuritySection.
  • make sure thatAllow Azure services and resources to access this workspaceis set toOne.
  • insert yoursClient-IP-AddressNOstart IPmiEnd IPtext fields and give the rule an appropriate name.

security settings

We use user-based security to access the SQL Serverless storage account, so the user account used to create the dataflow needs the appropriate permissions.

(Video) Intro to Serverless SQL in Azure Synapse

  • Select in the storage account created at the beginning of the processAccess control (IAM)and clickAdd to>Add Role Assignment
  • Make sure the user account is assigned toContributor for storage blob datapaper and hello
  • It may take a few minutes for security changes to take effect.

Serverless SQL database and SQL objects

We can now create SQL objects in the SQL Serverless environment that Dataflow will use.

  • Navigate to Synapse Analytics Workspace in the Azure portal.
  • NOoverviewsection, click the linkWorkspace-Web-URLfield to openSynapse Analysis Studio.
  • Expand the menu on the left and select theDataSection.
  • Choose+icon and selectSQL database.
  • make sure thatno serveris selected in the Select SQL Pool Type option and enter e.g. B. Enter a namesqldatalakehouse.
Introduction to Azure Synapse Analytics Serverless SQL - Serverless SQL (3)

After a new SQL Serverless database is created, let's create the required SQL objects so that we can read (SELECT) data from the storage accounts.

  • ChooseDevelopin the left menu pane of Synapse Studio.
  • Choose+icon and selectSQL Script.
  • NOCharacteristicsWindow on the right, give the script a name likeCreate SQL objects.
  • make sure thatConnectbecause is setBuilt-in.
  • make sure thatuse databaseis set tosqldatalakehouse(or the name of the new SQL Serverless database).

The following SQL code can be copied into the blank script to create SQL objects and run individually. Alternatively, you can create a new script file in Synapse Studio for each statement.

Introduction to Azure Synapse Analytics Serverless SQL - Serverless SQL (4)
Create database master key
WENN NICHT VORHANDEN (SELECT * FROM sys.symmetric_keys WHERE name AS '%DatabaseMasterKey%') CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<STRONG_PASSWORD>';
Create the database scope credentials
IF NOT EXIST (SELECT * FROM sys.database_credentials WHERE name = 'SynapseUserIdentity') CREATE CREDENTIAL WITH DATABASE SCOPE SynapseUserIdentity WITH IDENTITY = 'User Identity';
Create external file formats for CSV and Parquet
SE NÃO EXISTE (SELECT * DESDE sys.external_file_formats DONDE nome = 'SynapseParquetFormat') CREAR FORMATO DE ARCHIVO EXTERNO SynapseParquetFormatWITH ( FORMAT_TYPE = PARQUET); CON (FORMAT_TYPE = DELIMITEDTEXT, FORMAT_OPTIONS (PARSER_VERSION = '2.0', FIELD_TERMINATOR = '|', STRING_DELIMITER = '"', FIRST_ROW = 2));
Create an external data source

The <STORAGE_ACCOUNT_NAME> name is the name of the storage account created in the Storage Account Creation section, not in the Synapse Workspace section as this is where the CSV and Parquet data will be stored.

(Video) Azure Synapse Analytics: Serverless Pools [Introduction to Azure Synapse Analytics Series - Ep. 5]

WENN NICHT VORHANDEN (SELECT * FROM sys.external_data_sources WHERE name = 'storagedatalakehouse') CREATE EXTERNAL DATA SOURCE storagedatalakehouseWITH(LOCATION = 'https://<STORAGE_ACCOUNT_NAME>.dfs.core.windows.net/datalakehouse');
Create an external spreadsheet for CSV and Parquet files

Create the following 2 SQL scriptsExternal TablesThese are tables that can be queried in the same way as normal tables, but the data stays in the storage account and is not loaded into these types of tables. SQL Serverless only supports external tables.

Create external table dw.factianternetalesBig(Productkey Int, Orderdatekey Int, Duedatekey Int, Shipdatekey Int, Customerkey INT, PromotionKey INT, CurrencyKey INT, Salesterrykey Int, Salesordernumber Nvarchachar (20), SalesorderlineNumber Tinyint, Revision ) ,4), Expanded Amount DECIMAL( 10.4), Unit Price Discount Percentage FLOAT, Discount Amount FLOAT, Standard Product Cost DECIMAL(10.4), Total Product Cost DECIMAL(10.4), Sales Amount DECIMAL(10,4), Tax Amount DECIMAL(10,4), Freight DECIMAL(10 ,4), CarrierTrackingNumber NVARCHAR( 25), CustomerPONumber NVARCHAR(25), OrderDate DATETIME2(7), DueDate DATETIME2(7), ShipDate DATETIME2(7)) WITH (LOCATION = '/factinternetsalesbig/csv', DATA_SOURCE = store datastore , FILE_FORMAT = QuotedCsvWithHeaderFormat );
Create external table dw.factianternetsales Bigparquet (Productkey Int, Orderdatekey Int, Duedatekey Int, Shipdatekey Int, Customerkey INT, PromotionKey INT, CurrencyKey INT, SalesterritoryKey Int, Salesordernumber Nvarchar (20), Salesorderlinenumber Int, Revision ) ,4), Expanded Amount DECIMAL (10.4), Discount Per Unit Price FLOAT, Discount Amount FLOAT, Standard Product Cost DECIMAL(10.4), Total Product Cost DECIMAL(10.4), Sales Amount DECIMAL(10,4), Tax Amount DECIMAL(10,4), Freight DECIMAL(10,4), CarrierTrackingNumber NVARCHAR (25), CustomerPONumber NVARCHAR(25), OrderDate DATETIME2(7), DueDate DATETIME2(7), ShipDate DATETIME2(7)) WITH ( LOCATION = '/factinternetsalesbig/parquet', DATA_SOURCE = storage data repository, FILE_FORMAT = SynapseParquetFormat );
Run SELECT to test access
SELECT TOP 10 * FROM DW.FactInternetSalesBig SELECT TOP 10 * FROM DW.FactInternetSalesBigParquet;

Now we can run T-SQL statements against external tables. These tables can be accessed through SQL client tools like SQL Server Management Studio, Azure Data Studio and also business intelligence tools like Power BI.

Monitoring of the processed data

ÖMonitorThe pane in Synapse Analytics Studio contains several sections for monitoring Synapse Analytics features, including SQL Serverless. Click on thatMonitorthen click channelQueries SQLunderactivitiesSection. make sure thatSwimming poolthe filter is setBuilt-into view serverless SQL queries. This displays the SQL queries issued, which includes the query syntax and, more importantly, theProcessed Datametric. Use this metric to determine the amount of data processed (and therefore chargeable) for your workloads. Use this metric to plan workload costs.

Introduction to Azure Synapse Analytics Serverless SQL - Serverless SQL (5)

Control of data processing costs

To avoid unexpected costs arising from the use of the SQL Serverless engine, there are acost controlresource notAdminister>SQL-Setsarea (shown in the image below). Select this feature and add daily, weekly and monthly limits in increments of 1 (terabytes). GE set1 for daily,2 per weekmi4 per monthto ensure your daily processed data does not exceed 1TB, weekly data 2TB and monthly data 4TB.

Introduction to Azure Synapse Analytics Serverless SQL - Serverless SQL (6)
(Video) Synapse Espresso: Introduction into Synapse Serverless SQL Pools

Sauber

To remove resources from Azure, delete the resource group created at the beginning of this tutorial. All resources in this resource group will be deleted.

references

Videos

1. Azure Synapse Lakehouse with Serverless SQL and Spark Tables Tutorial
(Edward Kench)
2. Introduction to Azure Synapse Serverless Pool Warner Chaves
(DBAFundamentals)
3. Azure Synapse Analytics Serverless SQL Pools
(DesignMind)
4. Using SQL On Demand with Serverless Compute in Synapse Analytics
(Pragmatic Works)
5. Azure Synapse Analytics - Automating Serverless SQL Views
(Advancing Analytics)
6. 13. Server less SQL Pool Overview in Azure Synapse Analytics
(WafaStudies)

References

Top Articles
Latest Posts
Article information

Author: Rubie Ullrich

Last Updated: 09/06/2023

Views: 5989

Rating: 4.1 / 5 (52 voted)

Reviews: 91% of readers found this page helpful

Author information

Name: Rubie Ullrich

Birthday: 1998-02-02

Address: 743 Stoltenberg Center, Genovevaville, NJ 59925-3119

Phone: +2202978377583

Job: Administration Engineer

Hobby: Surfing, Sailing, Listening to music, Web surfing, Kitesurfing, Geocaching, Backpacking

Introduction: My name is Rubie Ullrich, I am a enthusiastic, perfect, tender, vivacious, talented, famous, delightful person who loves writing and wants to share my knowledge and understanding with you.