Query the datastore with a serverless SQL pool - Azure Synapse Analytics (2023)

  • Article
  • 8 minutes to read

Serverless SQL pool allows you to query data in your data lake. It provides a T-SQL query interface that supports queries on semi-structured and unstructured data. For queries, the following aspects of T-SQL are supported:

  • CompleteCHOOSEsurface, including mostSQL functions and operators.
  • CREATE EXTERNAL TABLE SELECTION (CETAS) A... createoutside tableand exports the results of a Transact-SQL SELECT statement to Azure Storage in parallel.

For more information on what is currently supported and what is not, seeServerless SQL pools overviewarticle or the following articles:

  • Develop memory accesswhere you can learn to useoutdoor tablejSET OPEN LINESFunction to read data from memory.
  • Control access to storageLearn how to enable Synapse SQL for storage access with SAS authentication or workspace managed identity.

general description

To support a seamless experience for directly querying data found in Azure Storage files, Serverless SQL pool uses theSET OPEN LINESFunction with additional features:

  • Scan multiple files or folders
  • PARQUET file format
  • CSV query and delimited text (field terminator, row terminator, escape character)
  • Format LAKE DELTA
  • Read a selected subset of columns
  • Schema Inference
  • filename function
  • file path function
  • Work with complex types and nested or repeating data structures

Consult PARQUET files

To query the Parquet source data, use FORMAT = 'PARQUET':

SELECCIONE * FROMOPENROWSET(BULK N'https://myaccount.dfs.core.windows.net/mycontainer/mysubfolder/data.parquet', FORMAT = 'PARQUET') CON (C1 int, C2 varchar(20), C3 varchar( max)) como filas

Check aConsult Parquet filesArticle for application examples.

Query CSV files

To query CSV source data, use FORMAT='CSV'. You can specify the schema of the CSV file as partSET OPEN LINESFunction when querying CSV files:

SELECTION * FROMOPENROWSET(BULK N'https://myaccount.dfs.core.windows.net/mycontainer/mysubfolder/data.csv', FORMAT = 'CSV', PARSER_VERSION='2.0') CON (C1 int, C2 varchar( 20), C3 varchar(max)) als Filas

There are some additional options to adapt the parsing rules to the custom CSv format:

(Video) Azure Synapse Lakehouse with Serverless SQL and Spark Tables Tutorial

  • ESCAPE_CHAR = 'char' Specifies the character in the file used to escape itself and any delimiter values ​​in the file. If the escape character is followed by a value other than itself or one of the delimiter values, the escape character is discarded when the value is read. The ESCAPE_CHAR parameter is applied regardless of whether FIELDQUOTE is enabled or not. It is not used to escape the double quote. The quotation mark must be escaped with another quotation mark. The double quote can appear in the column value only if the value is enclosed in double quotes.
  • FIELDTERMINATOR='field_terminator'Specifies the field terminator to use. The default field terminator is a comma (",")
  • ROWTERMINATOR='row_terminator'Specifies the row terminator to use. The default line terminator is a newline character:\r\n.

DELTA LAKE query format

To query the Delta Lake source data, use FORMAT = 'DELTA' and point to the root folder that contains your Delta Lake files.

SELECTION * FROMOPENROWSET(BULK N'https://myaccount.dfs.core.windows.net/mycontainer/mysubfolder', FORMAT = 'DELTA') CON (C1 int, C2 varchar(20), C3 varchar(max)) como filas

The root folder must contain a subfolder named_delta_log.Check aDelta Lake query formatArticle for application examples.

file schema

The SQL language in Synapse SQL allows you to define the schema of the file as part ofSET OPEN LINESworks and reads all or a subset of columns or tries to automatically determine column types from the file using schema inference.

Read a selected subset of columns

To specify the columns you want to read, you can specify an optional WITH clause in yourSET OPEN LINESOpinion.

  • If CSV data files exist, specify column names and their data types to read all columns. If you want a subset of columns, use ordinals to select columns from source data files by ordinal. Columns are connected by ordinal notation.
  • If Parquet data files exist, provide column names that match the column names in the source data files. Columns are linked by name.
SELECCIONE * FROMOPENROWSET(BULK N'https://myaccount.dfs.core.windows.net/mycontainer/mysubfolder/data.parquet', FORMAT = 'PARQUET') CON (C1 int, C2 varchar(20), C3 varchar( max)) como filas

For each column you must specify and write the column nameFRAUDClause. See below for examplesRead CSV files without specifying all columns.

Schema Inference

By omitting the WITH clause from theSET OPEN LINESdeclaration, you can tell the service to automatically detect (infer) the schema of the underlying files.

SELECCIONE * FROMOPENROWSET(BULK N'https://myaccount.dfs.core.windows.net/mycontainer/mysubfolder/data.parquet', FORMAT = 'PARQUET')

make sureappropriate derived data typesare used for optimal performance.

Scan multiple files or folders

To run a T-SQL query on a group of files in a folder or group of folders, treating them as a single entity or group of rows, specify a path to a folder or a pattern (using wildcards) within a group of files or to folders.

The following rules apply:

(Video) Using SQL On Demand with Serverless Compute in Synapse Analytics

  • Patterns can appear in part of a directory path or in a filename.
  • Multiple patterns can appear in the same directory or filename step.
  • If multiple wildcards are present, files in all matching paths will be included in the resulting fileset.
SELECCIONE * FROMOPENROWSET( BULK N'https://myaccount.dfs.core.windows.net/myroot/*/mysubfolder/*.parquet', FORMAT = 'PARQUET' ) aus Dateien

refer toQuery folders and multiple filesfor application examples.

filename function

This function returns the name of the file the line came from.

To refer to specific files, see the filename section in theQuery specific filesArticle.

The return data type is nvarchar(1024). For best performance, always convert the result of the filename function to the correct data type. If you use the character data type, be sure to use the correct length.

file path function

This function returns a full path or part of the path:

  • When called without parameters, returns the full path of the single-line source file.
  • When called with a parameter, returns a portion of the path that matches the placeholder at the position specified in the parameter. For example, a parameter value of 1 would return a portion of the path that matches the first placeholder.

For more information, see the File Path section of theQuery specific filesArticle.

The return data type is nvarchar(1024). For best performance, always convert the result of the filepath function to the correct data type. If you use the character data type, be sure to use the correct length.

Work with complex types and nested or repeating data structures

To provide a seamless experience with data stored in nested or repeating data types such asparquetfiles, the serverless SQL pool has added the following extensions.

Nested or repeating project data

To project data, run a SELECT statement on the parquet file that contains columns with nested data types. On output, the nested values ​​are serialized to JSON and returned as a varchar(8000) SQL data type.

(Video) Integrating Power BI with Azure Synapse Analytics Serverless SQL Pools by Andy Cutler

SELECT * FROM OPENROWSET(BULK 'unstructured_data_path', FORMAT = 'PARQUET') [AS Alias]

See the Nested or repeated data design section for more detailed informationNested type parquet queryArticle.

Access items from nested columns

To access nested elements of a nested column, e.g. B. Struct, use dot notation to concatenate field names in the path. Specify the path as the column name in the WITH clause ofSET OPEN LINESFunction.

The example syntax snippet is as follows:

OPENROWSET ( BULK 'ruta_datos_no estructurados' , FORMAT = 'PARQUET' ) WITH ({'column_name' 'column_type',}) [AS alias] 'column_name' ::= '[field_name.] field_name'

By default, theSET OPEN LINESThe function matches the source field name and path to the column names specified in the WITH clause. Elements contained at different nesting levels in the same Parquet source file can be accessed using the WITH clause.

return values

  • The function returns a scalar value such as int, decimal, and varchar of the specified element and path for all Parquet types that are not in the nested Type group.
  • If the path points to an element of the nested type, the function returns a JSON fragment starting at the top element in the specified path. The JSON fragment is of type varchar(8000).
  • If the property cannot be found in the specified column name, the function returns an error.
  • If the property cannot be found in the specified column path, depends onRoutenmodus, the function returns an error in strict mode or null in lax mode.

For example queries, see Accessing elements of nested columns in theNested type parquet queryArticle.

Access items from repeating columns

To access elements of a repeating column, e.g. B. an element of a matrix or a map, use theJSON_VALUEFor each scalar you need to design and provide:

  • Nested or repeated column as first parameter
  • ARota JSONwhich, as the second parameter, specifies the element or property to be accessed

To access non-scalar elements of a repeating column, use theJSON_QUERYFor each non-scalar element, you must design and provide:

  • Nested or repeated column as first parameter
  • ARota JSONwhich, as the second parameter, specifies the element or property to be accessed

See the following syntax snippet:

(Video) Mastering Azure Synapse Serverless SQL Pool - Jean Joseph

SELECT { JSON_VALUE (column_name, path_to_sub_element), } { JSON_QUERY (column_name [, path_to_sub_element]), ) FROM OPENROWSET (BULK 'path_unstructured_data', FORMAT = 'PARQUET') [AS alias]

For example queries to access items from repeating columns, seeNested type parquet queryArticle.

Examples of queries

Use the example queries to learn more about querying different types of data.

Tool

The tools required to issue queries: - Azure Synapse Studio - Azure Data Studio - SQL Server Management Studio

Demo-Setup

Your first step isCreate a databasewhere you run the queries. Then the objects are initialized by executingSetup Scriptin this database.

This setup script creates the data sources, database scope credentials, and external file formats used to read the data in these examples.

Use

Databases are only used to display metadata, not actual data. Make a note of the name of the database used, you will need it later.

CREATE DATABASE mydbname;

Demo data provided

The demo data contains the following datasets:

(Video) DBCC2020 International Edition - Analyze data using Azure Synapse Analytics SQL serverless

  • NYC Taxi - Yellow Taxi Trip Records - Part of NYC public data set in CSV and Parquet format
  • Population data set in CSV format
  • Examples of Parquet files with nested columns
  • Books in JSON format
folder pathDescription
/csv/Main folder for data in CSV format
/csv/population/
/csv/poblacion-unix/
/csv/poblacion-unix-hdr/
/csv/unix-population-hdr-escape
/csv/population-unix-hdr-cited
Folder containing population data files in various CSV formats.
/csv/taxi/Folder with public NYC data files in CSV format
/Parquet/Main folder for data in Parquet format
/parquet/taxiNew York City public data files in Parquet format, partitioned by year and month using the Hive/Hadoop partition scheme.
/parquet/nested/Examples of Parquet files with nested columns
/json/Parent folder for data in JSON format
/json/libros/JSON files with book data

Next Steps

For more information about querying different file types and creating and using views, see the following articles:

  • Query CSV files
  • Consult Parquet files
  • Query JSON files
  • Query nested values
  • Querying folders and multiple CSV files
  • Use file metadata in queries
  • Create and use views

Videos

1. Inside Serverless SQL pools in Azure Synapse Analytics By Bob Ward and Anshul Rampal
(DataPlatformGeeks & SQLServerGeeks)
2. Let's Build A...Data Lake Solution using Azure Synapse Analytics Serverless SQL Pools
(Datahai BI)
3. Azure Synapse Analytics: A Data Lakehouse - James Serra - Triangle SQL Server UG - Aug 2020
(James Serra)
4. Azure synapse
(Ritesh Mhatre)
5. Deep Dive into Azure Synapse SQL Serverless Pool
(MSDEVMTL)
6. Azure Synapse Analytics - What is a Dedicated SQL Pool in Azure Synapse Analytics | Azure | Whizlabs
(Whizlabs)

References

Top Articles
Latest Posts
Article information

Author: Nicola Considine CPA

Last Updated: 02/08/2023

Views: 5983

Rating: 4.9 / 5 (49 voted)

Reviews: 88% of readers found this page helpful

Author information

Name: Nicola Considine CPA

Birthday: 1993-02-26

Address: 3809 Clinton Inlet, East Aleisha, UT 46318-2392

Phone: +2681424145499

Job: Government Technician

Hobby: Calligraphy, Lego building, Worldbuilding, Shooting, Bird watching, Shopping, Cooking

Introduction: My name is Nicola Considine CPA, I am a determined, witty, powerful, brainy, open, smiling, proud person who loves writing and wants to share my knowledge and understanding with you.