This blog is tagged asintermediate knowledgeas an understanding of SQL Serverless pools is preferred but not essential. Refer toIntroduction to serverless SQL poolsunderstand the service and set up a serverless database.
blog content
general description
When working with serverless SQL pools and data stored in Azure Storage, it's helpful to review activity in your storage account to ensure activity is as expected. For example, if you use a serverless SQL group partitioning function such asDateirouteor query partitioned delta data, the Azure Storage logs show the scanned folders and files. You can then see that SQL queries using the partition only check the required folders and files. This is important to ensure that the data processed is as minimal as possible (as you will be billed for this).
In this blog, we cover how to configure Azure Storage logs in Azure Monitor and generate query activity using serverless SQL pools. Next, we'll use serverless SQL pools to analyze the logs generated with T-SQL.
In a future blog post we will see how to configurelog analysisand querying Azure Storage activity using KQL (Kusto Querying Language).
The SQL code for this example is available atserverlose sqlpooltoolsGithub-RepositoryHere.
Processed data from serverless SQL groups
We can see how much data is processed when running a SQL query against the data lake dataMonitorPane in Synapse Studio. This is very useful for checking how much data each SQL query is processing and the total cost associated with each query. This also helps in troubleshooting performance issues and flagging unoptimized workloads.
However, we can get more detailed telemetry data by looking at the storage logs. This really shows us the folders and files in the data lake that are being processed by the serverless SQL pools.
Prices
There is a cost associated with configuring the log, currently logs are stored in a storage account£0.256 per GBfor the processing itself (prices here), plus storage account cost (prices here), Physical education.£0.0156 per GBfor standard warm storage.
Azure storage log configuration
Now you'll learn how to create a storage account to store the logs, set up logging for an Azure Data Lake Gen2 account (we can't log into the same storage account that we're monitoring), and generate activities by running SQL queries Serverless SQL pools and then analyzing the logs.
Configure a general storage account
In order to keep records, we need to create oneuso general v2storage account. In the following example a new oneblue storagethe account was set up withStandardperformance andlocally redundant storage. NOProgressivescriptEnable hierarchical namespacehas not been activated. All other settings were left at default.

Set up memory read log
After configuring the general v2 account, we can now configure an Azure Data Lake Gen2 account to log activities in that account. In the following example, we set up an account with stored data that serverless SQL pools have query access to.
Search for the relevant Azure Data Lake Gen2 account in the Azure portal. click indiagnostic settingsNOAt sightsection in the left menu.

click intear offunder the storage account name in the tree list, and then clickAdd diagnostic settings.

Enter the following information in the diagnostic window.
- Enter a name for the diagnostic configuration
- AllowStorageReadlowThe record category(You can also enable StorageWrite when writing data back to Data Lake using serverless SQL pools.)
- chooseArchive to a storage accountand select the general storage account created in the previous step
- give clickSave on computer

Now that we've configured the Azure Data Lake Gen2 account to log and read activities, we can generate activities using queries from serverless SQL pools and view the results in the logs.
Run queries against serverless SQL pools
Now let's create a view for a serverless SQL pool database that contains 3 columns using theDateirouteFunction (more information here), we can use these columnscircumcision partitionand only process data in folders that we specify in the WHERE clause. The logs should show the folders queried when Serverless issued a SQL query.
The data set used is approximately 1 billion rows of web telemetry data from 09/2021 to 12/2021.
create view
In the following Vista definition, the OPENROWSET command references Web Telemetry event data saved as a parquet file format in a folder partition format\EventYear=AAAA\EventMonth=MM\EventDate=AAAA-MM-TT\. The wildcards * inA GRANELstatement is returned asCarpetaEventoAño,CarpetaEventoMes, OfCarpetaEventoFechaColumns in the view, allowing the use of filters in the WHERE clause. Note that there are three other date fields, EventYear, EventMonth, and EventDate - these columns are actual dates stored in the Parquet source files. We use these columns to illustrate the difference between filtering data using the file path columns and using columns within the Parquet files themselves.
A serverless SQL pools database has already been created and a data source has been added that points to Azure Data Lake Gen2.
--criar um ein Beispiel für Data Lake-DatenCREATE VIEW LDW.vwWebTelemetryParquetASSELECT UserID, EventType, ProductID, [URL] AS ProductURL, Device, SessionViewSeconds, FilePathYear AS EventYear, FilePathMonth AS EventMonth, FilePathDate AS EventDate, CAST(fct.filepath(1) AS SMALLINT) AS FolderEventYear, CAST(fct.filepath(2) AS TINYINT) AS FolderEventMonth, CAST(fct.filepath(3) AS DATE) AS FolderEventDateFROM OPENROWSET( BULK 'cleansed/webtelemetry/EventYear=*/EventMonth=*/EventDate= */*.parquet', DATA_SOURCE = 'ExternalDataSourceDataLakeMI', FORMAT = 'Parquet') WITH( UserID INT, EventType VARCHAR(50), EventDateTime DATE, ProductID SMALLINT, URL VARCHAR(50), Device VARCHAR(10), SessionViewSeconds INT , FilePathYear SMALLINT, FilePathMonth TINYINT, FilePathDate DATE) AS fct;
After a view is created, we can start executing queries. We will run 3 queries and then analyze the logs.
Query 1: Run the query without removing any partitions
This query adds events after event type column, it has no filter so we expect all data in theclean/web telemetryFolder for reading The following result statistics picture shows1095MBscan
--select all data SELECT Event Type, COUNT(*) AS Event Counter FROM LDW.vwWebTelemetryParquetGROUP BY EventType


Query 2: Run query with partition removal
Now let's run a query where we select data and use a column.CarpetaEventoFecha, which is the result ofDateirouteFunction, in the WHERE clause, to read only the data from the 2021-12-02 folder. In the logs we should see that Serverless SQL Pools checked only one of the folders. The following result statistics picture shows11MBscan
--filter using file route column FolderEventDateSELECT EventType, COUNT(*) AS EventCountFROM LDW.vwWebTelemetryParquetWHERE FolderEventDate = '2021-10-02'GROUP BY EventType


Query 3: Run query with filter but without deleting partitions
This query filters the data and returns the same results as above, except that instead of using the FolderEventDate column (file path function), we use a column in the Parquet data. When we run this query, we need to check all folders before returning the same record. In the logs we should see that Serverless SQL Pools checked all folders. The following result statistics picture shows778MBscan
SELECTIONE EventType, COUNT(*) AS EventCountFROM LDW.vwWebTelemetryParquetWHERE EventDate = '2021-10-02'GROUP BY EventType


Log File Analysis
After running the three SQL queries above, we can now analyze the captured logs in the general storage account. Records are stored in a container calledInsights-Logs-Storageread, then a folder structure similar to this example/resourceId=/suscrições//resourceGroups/dhrgsynapseuk/and has partitions for year, month, day, hour and minute. The following image shows a JSON log file in the root of the data folders. Notice that the storage account has been added to the Synapse workspace as a linked service.

JSON schema in log files
We can download and open a log file in any text editor and see the JSON, each logged request is a line-delimited JSON message. Certain attributes are displayed under certain circumstances, e.g. For example, the following request is from a serverless SQL pool request that uses managed identity as the security model to query the data lake. We can use attributes likedelegated resourceto filter only those records whose source system is Synapse.
{ "time":"2022-07-20T06:00:42.4059174Z", "resourceId":"/subscriptions/d496ab56/resourceGroups/dhrgsynapseuk/providers/Microsoft.Storage/storageAccounts/dhstordatalakeuk/blobServices/default", "category" :"StorageRead", "operationName":"ReadFile", "operationVersion":"2018-06-17", "schemaVersion":"1.0", "statusCode":206, "statusText":"Success", "durationMs" : 153, "CallerIpaddress": "10.0.0.15", "corralationId": "A5A84413-501f-0033-30fe-9ba79b000000", "Identity": {"Type": "Oauth", "Tokenhash": "AA5DB00B973961BF47573333334 "autorisiert ":[ { "acción":"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read", "roleAssignmentId":"7d0c900a", "roleDefinitionId":"ba92f5b4", "directores":[ { "id ": "60317038", "tipo":"ServicePrincipal" } ], "denyAssignmentId":"" } ], "solicitante":{ "appId":"04b6f050", "audiencia":"https://storage.azure .com /", "objectId":"60317038", "tenantId":"6ec2ccb9", "tokenIssuer":"https://sts.windows.net/6ec2ccb9/" }, "delegatedRes fuente":{ "resourceId":"/subscriptions/dfdsfds/resourcegroups/dhrgsynapseuk/providers/Microsoft.Synapse/workspaces/dhsynapseuk", "objectId":"/subscriptions/dfdsfds/resourcegroups/dhrgsynapseuk/providers/Microsoft.Synapse/workspaces /dhsynapseuk", "tenantId": "45ef4t4 " } }, "ubicación":"UK South", "properties":{ "accountName":"dhstordatalakeuk", "userAgentHeader":"SQLBLOBACCESS", "serviceType":"blob", "objectKey":"/ dhstordatalakeuk/datalakehouseuk/curated/webtelemetry/EventYear=2022/EventMonth=2/EventDate=2022-02-24/part-00023-52ae6d7c-a056-4827-8109-e4e1bb2782e6.c000.snappy. Parkett", "lastModifiedTime":"2022/04/07 10:22:24.1726814", "conditionsUsed":"If-Match=\"0x8DA188081F7C95E\"", "metricResponseType":"Sucesso", "serverLatencyMs":41, "requestHeaderSize": 1863, "responseHeaderSize": 398, "responseBodySize": 2097152, "tlsVersion": "TLS 1.2", "downloadRange": "bytes=41975061-44072212" }, "uri": "https://dhstordatalakeuk .dfs.core.windows.net/datalakehouseuk/curated/webtelemetry/EventYear=2022/Even tMonth=2/EventDate=2022-02-24/part-00023-52ae6d7c-a056-4827-8109-e4e1bb2782e6.c000.snappy. parkett", "protocolo":"HTTPS", "resourceType":"Microsoft.Storage/ cuentas de almacenamiento/servicios de blob"}
View logs in serverless SQL pools
We will use serverless SQL pools to query the log file as it supports querying JSON structures. The storage account has been added to the Synapse workspace as a linked service and we can create a view for a serverless SQL pool database using the registry path in the OPENROWSET command. Wildcards have also been added to folders partitioned by date./y=*/m=*/d=*/h=*/m=*/in the BULK statement to allow using the filepath function to filter only after certain periods of time. 8 columns have been added to the view to allow filtering by folders.
The SQL code for this example is available atserverlose sqlpooltoolsGithub-RepositoryHere.
00create and configure a new SQL pools serverless databaseCREATE DATABASE LogAnalysisUSE LogAnalysis - encrypted to allow authenticationCREATE PASSWORD ENCRYPTED MASTER KEY = 'dsfads$%zdsfkjsdhlk456hvwegf';--ensure that the Synapse workspace is Storage Blob as data reader is added to the general storage accountCREATE DATABASE CREDENTIAL SCOPE DataLakeManagedIdentityWITH IDENTITY='Managed Identity' - Create data source for general storage account - replace <storage account> with relevant value CREATE EXTERNAL DATA SOURCE ExternalDataSourceDataLakeMIWITH(LOCATION = 'https://<storage account>.blob . core.windows .net/insights-logs-storageread', CREDENTIAL = DataLakeManagedIdentity);--enables support for UTF8ALTER DATABASE LogAnalysis COLLATE Latin1_General_100_BIN2_UTF8;--creates view over storage logsCREATE OR ALTER VIEW dbo.vwAnalyseLogsASSELECT time, resourceId, category, operationName , o-Version Operati on, schema version, status code, status text, ms duration, caller ip address, correlationId, identity_type, identity_tokenHash, [location], identity_delegatedResource_resourceId, properties_accountName, properties_serviceType, properties_objectKey, properties_metricResponseType, properties_serverLatencyMs, properties_requestHeaderSize, properties_responseHeaderSize, properties_responseBodySize, properties_tls, properties_tls log, resource type, jsonrows.filepath, 2 arrows (1) arrows. ) as ResourceGroup, jsonrows.filepath(3) as StorageAccount, jsonrows.filepath(4) as LogYear, jsonrows.filepath(5) as LogMonth, jsonrows.filepath(6) as LogDay, jsonrows.filepath(7) as LogHour, jsonrows .filepath(8) as LogMinuteFROM OPENROWSET (BULK '/resourceId=/subscriptions/*/resourceGroups/*/providers/Microsoft.Storage/storageAccounts/*/blobServices/default/y=*/m=*/d=*/h =*/m=*/*', DATA_SOURCE = 'ExternalDataSourceDataLakeMI', FORMAT = 'CSV',PARSER_VERSION = '2.0', FIELDTERMINATOR = '0x09', FIELDQUOTE = '0x0b', ROWTERMINATOR = '0x0A' ) WITH (doc NVARCHAR (4000) ) AS jsonrows CROSS APPLY OPENJSON (doc) WITH ( time DATETIME2 '$.time', resourceId VARCHAR(500) '$.resourceId', category VARCHAR(50) '$.category', operationName VARCHAR(100) ' $.operationName ', OperationVersion VARCHAR(10) '$.operationVersion', schemaVersion VARCHAR(10) '$.schemaVersion', statusCode SMALLINT '$.statusCode', statusText VARCHAR(100) '$.statusText', durationMs INT '$ .durationMs' , caller IP address s V AR CHAR(50) '$.callerIpAddress', Correlation ID VARCHAR(50) '$.correlationId', Identity_type VARCHAR(100) '$.identity.type', Identity_tokenHash VARCHAR(100) '$.identity.tokenHash', [ location] VARCHAR(50) '$.location', Identity_delegatedResource_resourceId VARCHAR(500) '$.identity.delegatedResource.resourceId', properties_accountName VARCHAR(50) '$.properties.accountName', properties_serviceType VARCHAR(30) '$.properties . serviceType', properties_objectKey VARCHAR(250) '$.properties.objectKey', properties_metricResponseType VARCHAR(50) '$.properties.metricResponseType', properties_se rverLatencyMs INT '$.properties.serverLatencyMs', properties_SizerequestHeaderSize'requestHeader ' properties_responseHeaderSize INT '$.properties .responseHeaderSize', properties_responseBodySize INT '$.properties.responseBodySize', properties_tlsVersion .$.properties) VARCHAR'(10) tlsVersion', uri VARCHAR(500) '$.uri', protocol VARCHAR(50) '$.protocol', resourceType VARCHAR(250) '$.resourceType' )
Now we can run SQL queries against the logs to see the results. The next two queries below select the activity added in the EventMonth and EventDate columns of the source data. You can also use theTempoColumn within the JSON data itself to more precisely identify it.
--adicionar pela origem EventMonth and mostrar quantos arquivos exclusivos foram verificados SELECT statusText, CAST(REPLACE(SUBSTRING(uri,PATINDEX('%EventMonth=%',uri)+11,2),'/','') AS TINYINT ) AS URIFolderMonth, COUNT(DISTINCT uri) AS FileScanCountFROM dbo.vwAnalyseLogsWHERE LogYear = 2022AND LogMonth = '07'AND LogDay = '20'AND LogHour = '20'AND operationName = 'ReadFile'AND identity_delegatedResource_resourceId WIE '%dhsynapsews' - -synapse workspaceGROUP BY statusText, CAST(REPLACE(SUBSTRING(uri,PATINDEX('%EventMonth=%',uri)+11,2),'/','') AS TINYINT)ORDER BY 2--adicionar pela fonte Pasta EventMonth e EventDate und die meisten werden exklusiv für verifizierte Statustexte ausgewählt, CAST(REPLACE(SUBSTRING(uri, PATINDEX('%EventMonth=%', uri)+11,2), '/', ') AS TINYINT) AS URIFolderMonth, SUBSTRING (uri,PATINDEX('%EventDate=%',uri)+10,10) AS URIFolderDate, COUNT(DISTINCT uri) AS FileScanCountFROM dbo.vwAnalyseLogsWHERE LogYear = 2022AND LogMonth = '07'AND LogDay = 'Jahrgang'AND LogH our = '12'Y operationName = 'ReadFile'AND Identity_delegatedResource_resourceId LIKE '%dhsynapsews%'GROUP BY statusText, CAST(REPLACE(SUBSTRING(uri,PATINDEX('%EventMonth=%',uri)+11,2), '/ ','') AS TINYINT) , SUBSTRING(uri,PATINDEX('%EventDate=%',uri)+10,10)SORT BY 3
In the first query we ran above, we can see the scanned folders and files. Aggregated log queries show that all available month folders (September to December 2021) are scanned.


For the second query, if we run the same 2 SQL log statements as above (with modified filters to select the appropriate logs) to see which folders were scanned, we see that theEventoMes=10The folder has been checked. If we also query to see whatdate of the eventFolders have been checked, we see that only theFechaEvent=2021-10-02The folder has been checked. Serverless SQL pools successfully "partitioned" the query when using the filepath column in the WHERE clause.


Finally, if we look at the logs for the third query, where we filtered the data but used a column in the parquet data and not a file path column, we can see that all folders and files are scanned. This is because the column itself is in the Parquet file and Serverless has to scan all the files to find the relevant values and return the results. This significantly increases the amount of data processed, although we get the same results as the second query that used the file path column for filtering.


Diploma
In this blog, we discuss how to configure logging with Azure Monitor to record activity between serverless SQL pools and Azure Data Lake Gen2. We then run multiple queries to generate activity and then analyze that activity using serverless SQL pools to query the log files. In the next part of this series, we look at using Log Analytics to collect and view activity between serverless SQL pools and Azure Data Lake Gen2. Log Analytics is preferred in an environment where activity is monitored across multiple systems as it provides a central location to view logs.
We might also consider using the metrics contained in the logs to calculate the read size of the data and compare it to serverless SQL pool monitoring, maybe something for a future blog update.
references
- Monitoring Azure Blob Storage | Microsoft documents
- Azure Blob Storage Monitoring Data Reference | Microsoft documents
- Querying JSON files using a serverless SQL pool - Azure Synapse Analytics | Microsoft documents
- Storage Analytical Logged Operations and Status Messages (REST API): Azure Storage | Microsoft documents
- Storage Analytics-Protokollformat (REST-API): Azure Storage | Microsoft-Dokumente
- Precios: Monitor Azure | Microsoft Azure
- Monitoring: Microsoft Azure
- Querying JSON files using a serverless SQL pool - Azure Synapse Analytics | Microsoft documents
- ASCII graphics and ISO 1252 Latin-1 character set | barcodefaq.com
- JSON formatter and validator (curiousconcept.com)
- Precios: Monitor Azure | Microsoft Azure