Use Azure Storage logs to analyze Synapse Analytics Serverless SQL pool activity - Serverless SQL (2023)

This blog is tagged asintermediate knowledgeas an understanding of SQL Serverless pools is preferred but not essential. Refer toIntroduction to serverless SQL poolsunderstand the service and set up a serverless database.

blog content

general description

When working with serverless SQL pools and data stored in Azure Storage, it's helpful to review activity in your storage account to ensure activity is as expected. For example, if you use a serverless SQL group partitioning function such asDateirouteor query partitioned delta data, the Azure Storage logs show the scanned folders and files. You can then see that SQL queries using the partition only check the required folders and files. This is important to ensure that the data processed is as minimal as possible (as you will be billed for this).

In this blog, we cover how to configure Azure Storage logs in Azure Monitor and generate query activity using serverless SQL pools. Next, we'll use serverless SQL pools to analyze the logs generated with T-SQL.

In a future blog post we will see how to configurelog analysisand querying Azure Storage activity using KQL (Kusto Querying Language).

The SQL code for this example is available atserverlose sqlpooltoolsGithub-RepositoryHere.

Processed data from serverless SQL groups

We can see how much data is processed when running a SQL query against the data lake dataMonitorPane in Synapse Studio. This is very useful for checking how much data each SQL query is processing and the total cost associated with each query. This also helps in troubleshooting performance issues and flagging unoptimized workloads.

(Video) Azure Synapse Analytics: Query Delta Lake file using Serverless SQL Pool | DEMO | Sarnendu De

However, we can get more detailed telemetry data by looking at the storage logs. This really shows us the folders and files in the data lake that are being processed by the serverless SQL pools.

Prices

There is a cost associated with configuring the log, currently logs are stored in a storage account£0.256 per GBfor the processing itself (prices here), plus storage account cost (prices here), Physical education.£0.0156 per GBfor standard warm storage.

Azure storage log configuration

Now you'll learn how to create a storage account to store the logs, set up logging for an Azure Data Lake Gen2 account (we can't log into the same storage account that we're monitoring), and generate activities by running SQL queries Serverless SQL pools and then analyzing the logs.

Configure a general storage account

In order to keep records, we need to create oneuso general v2storage account. In the following example a new oneblue storagethe account was set up withStandardperformance andlocally redundant storage. NOProgressivescriptEnable hierarchical namespacehas not been activated. All other settings were left at default.

Use Azure Storage logs to analyze Synapse Analytics Serverless SQL pool activity - Serverless SQL (2)

Set up memory read log

After configuring the general v2 account, we can now configure an Azure Data Lake Gen2 account to log activities in that account. In the following example, we set up an account with stored data that serverless SQL pools have query access to.

Search for the relevant Azure Data Lake Gen2 account in the Azure portal. click indiagnostic settingsNOAt sightsection in the left menu.

Use Azure Storage logs to analyze Synapse Analytics Serverless SQL pool activity - Serverless SQL (3)

click intear offunder the storage account name in the tree list, and then clickAdd diagnostic settings.

Use Azure Storage logs to analyze Synapse Analytics Serverless SQL pool activity - Serverless SQL (4)

Enter the following information in the diagnostic window.

  • Enter a name for the diagnostic configuration
  • AllowStorageReadlowThe record category(You can also enable StorageWrite when writing data back to Data Lake using serverless SQL pools.)
  • chooseArchive to a storage accountand select the general storage account created in the previous step
  • give clickSave on computer
Use Azure Storage logs to analyze Synapse Analytics Serverless SQL pool activity - Serverless SQL (5)

Now that we've configured the Azure Data Lake Gen2 account to log and read activities, we can generate activities using queries from serverless SQL pools and view the results in the logs.

(Video) Building a Data Warehouse Dimensional Model using Azure Synapse Analytics SQL Serverless

Run queries against serverless SQL pools

Now let's create a view for a serverless SQL pool database that contains 3 columns using theDateirouteFunction (more information here), we can use these columnscircumcision partitionand only process data in folders that we specify in the WHERE clause. The logs should show the folders queried when Serverless issued a SQL query.

The data set used is approximately 1 billion rows of web telemetry data from 09/2021 to 12/2021.

create view

In the following Vista definition, the OPENROWSET command references Web Telemetry event data saved as a parquet file format in a folder partition format\EventYear=AAAA\EventMonth=MM\EventDate=AAAA-MM-TT\. The wildcards * inA GRANELstatement is returned asCarpetaEventoAño,CarpetaEventoMes, OfCarpetaEventoFechaColumns in the view, allowing the use of filters in the WHERE clause. Note that there are three other date fields, EventYear, EventMonth, and EventDate - these columns are actual dates stored in the Parquet source files. We use these columns to illustrate the difference between filtering data using the file path columns and using columns within the Parquet files themselves.

A serverless SQL pools database has already been created and a data source has been added that points to Azure Data Lake Gen2.

--criar um ein Beispiel für Data Lake-DatenCREATE VIEW LDW.vwWebTelemetryParquetASSELECT UserID, EventType, ProductID, [URL] AS ProductURL, Device, SessionViewSeconds, FilePathYear AS EventYear, FilePathMonth AS EventMonth, FilePathDate AS EventDate, CAST(fct.filepath(1) AS SMALLINT) AS FolderEventYear, CAST(fct.filepath(2) AS TINYINT) AS FolderEventMonth, CAST(fct.filepath(3) AS DATE) AS FolderEventDateFROM OPENROWSET( BULK 'cleansed/webtelemetry/EventYear=*/EventMonth=*/EventDate= */*.parquet', DATA_SOURCE = 'ExternalDataSourceDataLakeMI', FORMAT = 'Parquet') WITH( UserID INT, EventType VARCHAR(50), EventDateTime DATE, ProductID SMALLINT, URL VARCHAR(50), Device VARCHAR(10), SessionViewSeconds INT , FilePathYear SMALLINT, FilePathMonth TINYINT, FilePathDate DATE) AS fct;

After a view is created, we can start executing queries. We will run 3 queries and then analyze the logs.

Query 1: Run the query without removing any partitions

This query adds events after event type column, it has no filter so we expect all data in theclean/web telemetryFolder for reading The following result statistics picture shows1095MBscan

--select all data SELECT Event Type, COUNT(*) AS Event Counter FROM LDW.vwWebTelemetryParquetGROUP BY EventType
Use Azure Storage logs to analyze Synapse Analytics Serverless SQL pool activity - Serverless SQL (6)
Use Azure Storage logs to analyze Synapse Analytics Serverless SQL pool activity - Serverless SQL (7)
(Video) Synapse Analytics - Querying Delta Lake with Serverless SQL Pools

Query 2: Run query with partition removal

Now let's run a query where we select data and use a column.CarpetaEventoFecha, which is the result ofDateirouteFunction, in the WHERE clause, to read only the data from the 2021-12-02 folder. In the logs we should see that Serverless SQL Pools checked only one of the folders. The following result statistics picture shows11MBscan

--filter using file route column FolderEventDateSELECT EventType, COUNT(*) AS EventCountFROM LDW.vwWebTelemetryParquetWHERE FolderEventDate = '2021-10-02'GROUP BY EventType
Use Azure Storage logs to analyze Synapse Analytics Serverless SQL pool activity - Serverless SQL (8)
Use Azure Storage logs to analyze Synapse Analytics Serverless SQL pool activity - Serverless SQL (9)

Query 3: Run query with filter but without deleting partitions

This query filters the data and returns the same results as above, except that instead of using the FolderEventDate column (file path function), we use a column in the Parquet data. When we run this query, we need to check all folders before returning the same record. In the logs we should see that Serverless SQL Pools checked all folders. The following result statistics picture shows778MBscan

SELECTIONE EventType, COUNT(*) AS EventCountFROM LDW.vwWebTelemetryParquetWHERE EventDate = '2021-10-02'GROUP BY EventType
Use Azure Storage logs to analyze Synapse Analytics Serverless SQL pool activity - Serverless SQL (10)
Use Azure Storage logs to analyze Synapse Analytics Serverless SQL pool activity - Serverless SQL (11)

Log File Analysis

After running the three SQL queries above, we can now analyze the captured logs in the general storage account. Records are stored in a container calledInsights-Logs-Storageread, then a folder structure similar to this example/resourceId=/suscrições//resourceGroups/dhrgsynapseuk/and has partitions for year, month, day, hour and minute. The following image shows a JSON log file in the root of the data folders. Notice that the storage account has been added to the Synapse workspace as a linked service.

(Video) How To: Use stored procedures in Serverless SQL Pools (preview)

Use Azure Storage logs to analyze Synapse Analytics Serverless SQL pool activity - Serverless SQL (12)

JSON schema in log files

We can download and open a log file in any text editor and see the JSON, each logged request is a line-delimited JSON message. Certain attributes are displayed under certain circumstances, e.g. For example, the following request is from a serverless SQL pool request that uses managed identity as the security model to query the data lake. We can use attributes likedelegated resourceto filter only those records whose source system is Synapse.

{ "time":"2022-07-20T06:00:42.4059174Z", "resourceId":"/subscriptions/d496ab56/resourceGroups/dhrgsynapseuk/providers/Microsoft.Storage/storageAccounts/dhstordatalakeuk/blobServices/default", "category" :"StorageRead", "operationName":"ReadFile", "operationVersion":"2018-06-17", "schemaVersion":"1.0", "statusCode":206, "statusText":"Success", "durationMs" : 153, "CallerIpaddress": "10.0.0.15", "corralationId": "A5A84413-501f-0033-30fe-9ba79b000000", "Identity": {"Type": "Oauth", "Tokenhash": "AA5DB00B973961BF47573333334 "autorisiert ":[ { "acción":"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read", "roleAssignmentId":"7d0c900a", "roleDefinitionId":"ba92f5b4", "directores":[ { "id ": "60317038", "tipo":"ServicePrincipal" } ], "denyAssignmentId":"" } ], "solicitante":{ "appId":"04b6f050", "audiencia":"https://storage.azure .com /", "objectId":"60317038", "tenantId":"6ec2ccb9", "tokenIssuer":"https://sts.windows.net/6ec2ccb9/" }, "delegatedRes fuente":{ "resourceId":"/subscriptions/dfdsfds/resourcegroups/dhrgsynapseuk/providers/Microsoft.Synapse/workspaces/dhsynapseuk", "objectId":"/subscriptions/dfdsfds/resourcegroups/dhrgsynapseuk/providers/Microsoft.Synapse/workspaces /dhsynapseuk", "tenantId": "45ef4t4 " } }, "ubicación":"UK South", "properties":{ "accountName":"dhstordatalakeuk", "userAgentHeader":"SQLBLOBACCESS", "serviceType":"blob", "objectKey":"/ dhstordatalakeuk/datalakehouseuk/curated/webtelemetry/EventYear=2022/EventMonth=2/EventDate=2022-02-24/part-00023-52ae6d7c-a056-4827-8109-e4e1bb2782e6.c000.snappy. Parkett", "lastModifiedTime":"2022/04/07 10:22:24.1726814", "conditionsUsed":"If-Match=\"0x8DA188081F7C95E\"", "metricResponseType":"Sucesso", "serverLatencyMs":41, "requestHeaderSize": 1863, "responseHeaderSize": 398, "responseBodySize": 2097152, "tlsVersion": "TLS 1.2", "downloadRange": "bytes=41975061-44072212" }, "uri": "https://dhstordatalakeuk .dfs.core.windows.net/datalakehouseuk/curated/webtelemetry/EventYear=2022/Even tMonth=2/EventDate=2022-02-24/part-00023-52ae6d7c-a056-4827-8109-e4e1bb2782e6.c000.snappy. parkett", "protocolo":"HTTPS", "resourceType":"Microsoft.Storage/ cuentas de almacenamiento/servicios de blob"}

View logs in serverless SQL pools

We will use serverless SQL pools to query the log file as it supports querying JSON structures. The storage account has been added to the Synapse workspace as a linked service and we can create a view for a serverless SQL pool database using the registry path in the OPENROWSET command. Wildcards have also been added to folders partitioned by date./y=*/m=*/d=*/h=*/m=*/in the BULK statement to allow using the filepath function to filter only after certain periods of time. 8 columns have been added to the view to allow filtering by folders.

The SQL code for this example is available atserverlose sqlpooltoolsGithub-RepositoryHere.

00create and configure a new SQL pools serverless databaseCREATE DATABASE LogAnalysisUSE LogAnalysis - encrypted to allow authenticationCREATE PASSWORD ENCRYPTED MASTER KEY = 'dsfads$%zdsfkjsdhlk456hvwegf';--ensure that the Synapse workspace is Storage Blob as data reader is added to the general storage accountCREATE DATABASE CREDENTIAL SCOPE DataLakeManagedIdentityWITH IDENTITY='Managed Identity' - Create data source for general storage account - replace <storage account> with relevant value CREATE EXTERNAL DATA SOURCE ExternalDataSourceDataLakeMIWITH(LOCATION = 'https://<storage account>.blob . core.windows .net/insights-logs-storageread', CREDENTIAL = DataLakeManagedIdentity);--enables support for UTF8ALTER DATABASE LogAnalysis COLLATE Latin1_General_100_BIN2_UTF8;--creates view over storage logsCREATE OR ALTER VIEW dbo.vwAnalyseLogsASSELECT time, resourceId, category, operationName , o-Version Operati on, schema version, status code, status text, ms duration, caller ip address, correlationId, identity_type, identity_tokenHash, [location], identity_delegatedResource_resourceId, properties_accountName, properties_serviceType, properties_objectKey, properties_metricResponseType, properties_serverLatencyMs, properties_requestHeaderSize, properties_responseHeaderSize, properties_responseBodySize, properties_tls, properties_tls log, resource type, jsonrows.filepath, 2 arrows (1) arrows. ) as ResourceGroup, jsonrows.filepath(3) as StorageAccount, jsonrows.filepath(4) as LogYear, jsonrows.filepath(5) as LogMonth, jsonrows.filepath(6) as LogDay, jsonrows.filepath(7) as LogHour, jsonrows .filepath(8) as LogMinuteFROM OPENROWSET (BULK '/resourceId=/subscriptions/*/resourceGroups/*/providers/Microsoft.Storage/storageAccounts/*/blobServices/default/y=*/m=*/d=*/h =*/m=*/*', DATA_SOURCE = 'ExternalDataSourceDataLakeMI', FORMAT = 'CSV',PARSER_VERSION = '2.0', FIELDTERMINATOR = '0x09', FIELDQUOTE = '0x0b', ROWTERMINATOR = '0x0A' ) WITH (doc NVARCHAR (4000) ) AS jsonrows CROSS APPLY OPENJSON (doc) WITH ( time DATETIME2 '$.time', resourceId VARCHAR(500) '$.resourceId', category VARCHAR(50) '$.category', operationName VARCHAR(100) ' $.operationName ', OperationVersion VARCHAR(10) '$.operationVersion', schemaVersion VARCHAR(10) '$.schemaVersion', statusCode SMALLINT '$.statusCode', statusText VARCHAR(100) '$.statusText', durationMs INT '$ .durationMs' , caller IP address s V AR CHAR(50) '$.callerIpAddress', Correlation ID VARCHAR(50) '$.correlationId', Identity_type VARCHAR(100) '$.identity.type', Identity_tokenHash VARCHAR(100) '$.identity.tokenHash', [ location] VARCHAR(50) '$.location', Identity_delegatedResource_resourceId VARCHAR(500) '$.identity.delegatedResource.resourceId', properties_accountName VARCHAR(50) '$.properties.accountName', properties_serviceType VARCHAR(30) '$.properties . serviceType', properties_objectKey VARCHAR(250) '$.properties.objectKey', properties_metricResponseType VARCHAR(50) '$.properties.metricResponseType', properties_se rverLatencyMs INT '$.properties.serverLatencyMs', properties_SizerequestHeaderSize'requestHeader ' properties_responseHeaderSize INT '$.properties .responseHeaderSize', properties_responseBodySize INT '$.properties.responseBodySize', properties_tlsVersion .$.properties) VARCHAR'(10) tlsVersion', uri VARCHAR(500) '$.uri', protocol VARCHAR(50) '$.protocol', resourceType VARCHAR(250) '$.resourceType' )

Now we can run SQL queries against the logs to see the results. The next two queries below select the activity added in the EventMonth and EventDate columns of the source data. You can also use theTempoColumn within the JSON data itself to more precisely identify it.

--adicionar pela origem EventMonth and mostrar quantos arquivos exclusivos foram verificados SELECT statusText, CAST(REPLACE(SUBSTRING(uri,PATINDEX('%EventMonth=%',uri)+11,2),'/','') AS TINYINT ) AS URIFolderMonth, COUNT(DISTINCT uri) AS FileScanCountFROM dbo.vwAnalyseLogsWHERE LogYear = 2022AND LogMonth = '07'AND LogDay = '20'AND LogHour = '20'AND operationName = 'ReadFile'AND identity_delegatedResource_resourceId WIE '%dhsynapsews' - -synapse workspaceGROUP BY statusText, CAST(REPLACE(SUBSTRING(uri,PATINDEX('%EventMonth=%',uri)+11,2),'/','') AS TINYINT)ORDER BY 2--adicionar pela fonte Pasta EventMonth e EventDate und die meisten werden exklusiv für verifizierte Statustexte ausgewählt, CAST(REPLACE(SUBSTRING(uri, PATINDEX('%EventMonth=%', uri)+11,2), '/', ') AS TINYINT) AS URIFolderMonth, SUBSTRING (uri,PATINDEX('%EventDate=%',uri)+10,10) AS URIFolderDate, COUNT(DISTINCT uri) AS FileScanCountFROM dbo.vwAnalyseLogsWHERE LogYear = 2022AND LogMonth = '07'AND LogDay = 'Jahrgang'AND LogH our = '12'Y operationName = 'ReadFile'AND Identity_delegatedResource_resourceId LIKE '%dhsynapsews%'GROUP BY statusText, CAST(REPLACE(SUBSTRING(uri,PATINDEX('%EventMonth=%',uri)+11,2), '/ ','') AS TINYINT) , SUBSTRING(uri,PATINDEX('%EventDate=%',uri)+10,10)SORT BY 3

In the first query we ran above, we can see the scanned folders and files. Aggregated log queries show that all available month folders (September to December 2021) are scanned.

Use Azure Storage logs to analyze Synapse Analytics Serverless SQL pool activity - Serverless SQL (13)
Use Azure Storage logs to analyze Synapse Analytics Serverless SQL pool activity - Serverless SQL (14)

For the second query, if we run the same 2 SQL log statements as above (with modified filters to select the appropriate logs) to see which folders were scanned, we see that theEventoMes=10The folder has been checked. If we also query to see whatdate of the eventFolders have been checked, we see that only theFechaEvent=2021-10-02The folder has been checked. Serverless SQL pools successfully "partitioned" the query when using the filepath column in the WHERE clause.

Use Azure Storage logs to analyze Synapse Analytics Serverless SQL pool activity - Serverless SQL (15)
Use Azure Storage logs to analyze Synapse Analytics Serverless SQL pool activity - Serverless SQL (16)

Finally, if we look at the logs for the third query, where we filtered the data but used a column in the parquet data and not a file path column, we can see that all folders and files are scanned. This is because the column itself is in the Parquet file and Serverless has to scan all the files to find the relevant values ​​and return the results. This significantly increases the amount of data processed, although we get the same results as the second query that used the file path column for filtering.

Use Azure Storage logs to analyze Synapse Analytics Serverless SQL pool activity - Serverless SQL (17)
Use Azure Storage logs to analyze Synapse Analytics Serverless SQL pool activity - Serverless SQL (18)

Diploma

In this blog, we discuss how to configure logging with Azure Monitor to record activity between serverless SQL pools and Azure Data Lake Gen2. We then run multiple queries to generate activity and then analyze that activity using serverless SQL pools to query the log files. In the next part of this series, we look at using Log Analytics to collect and view activity between serverless SQL pools and Azure Data Lake Gen2. Log Analytics is preferred in an environment where activity is monitored across multiple systems as it provides a central location to view logs.

We might also consider using the metrics contained in the logs to calculate the read size of the data and compare it to serverless SQL pool monitoring, maybe something for a future blog update.

(Video) Azure Synapse Analytics - Automating Serverless SQL Views

references

Videos

1. SQL Friday #69 - Kevin Feasel on Serverless SQL Pools in Azure Synapse Analytics
(TRANSMOKOPTER SQL AB)
2. The Dream Team: Synapse Analytics Serverless SQL Pools and Pipelines - Andy Cutler
(SQLBits)
3. 22. Create Login and User for Server less SQL Pool in Azure Synapse Analytics
(WafaStudies)
4. Synapse Serverless SQL Pool and Spark pool: Better Together by Armando Lacerda
(DataPlatformGeeks & SQLServerGeeks)
5. Harnessing Azure Synapse Analytics SQL Serverless in Power BI Dataflows
(Community 1nn0va)
6. The Dream Team: Synapse Analytics Serverless SQL Pools & Pipelines by Andy Cutler
(Bright International Communities)
Top Articles
Latest Posts
Article information

Author: Prof. Nancy Dach

Last Updated: 06/02/2023

Views: 5999

Rating: 4.7 / 5 (57 voted)

Reviews: 80% of readers found this page helpful

Author information

Name: Prof. Nancy Dach

Birthday: 1993-08-23

Address: 569 Waelchi Ports, South Blainebury, LA 11589

Phone: +9958996486049

Job: Sales Manager

Hobby: Web surfing, Scuba diving, Mountaineering, Writing, Sailing, Dance, Blacksmithing

Introduction: My name is Prof. Nancy Dach, I am a lively, joyous, courageous, lovely, tender, charming, open person who loves writing and wants to share my knowledge and understanding with you.