Optimize database schema on serverless SQL pools using QPI library (2023)

With serverless SQL pools, you can query data stored in Azure Data Lake Storage, Cosmos DB analytical storage, or Dataverse without having to import your data into database tables. It is very important to apply best practices and tune the schema/queries for optimal performance. Some of the mandatory best practices are:

  • Make sure your client (PowerBI), server, and storage tier are in the same region.
  • If you are using ADLS, make sure that the files are of a consistent size in the range of 100 MB to 10 GB.
  • Be sure to use schemas and ideal types. Organize your tables/views in star/snowflake pattern, pre-added measures, avoid using VARCHAR(MAX) type for smaller strings, don't use NVARCHAR when using UTF-8 encoded data, use BIN2_UTF8 collation for strings in which filtration is found.

You can find more like thisBest practices here. These best practices are very important because some issues can cause performance degradation. You'll be surprised how applying some of these best practices can improve the performance of your workload.

The last point related to schema optimization is sometimes difficult to verify. You would have to look at your schema, examine all the columns, and figure out what to optimize for. If you have a large schematic, this may not be an easy task. But you can make your life easier if you use it.the QPI helper libraryit can detect schema problems for you.

(Video) 13. Server less SQL Pool Overview in Azure Synapse Analytics

QPI Library

Query Performance Insights (QPI) is a free and open source set of T-SQL views that you can configure on your database and get useful information about your database. You can install this library by simply running itroad map. after running thisroad map(You can manually check this and confirm that it only points to some DMVs in the system) You can use the views in the QPI schema to find recommendations to tune your database or fix your queries. In this article, I'll talk about the view that returns recommendations explaining how to optimize the schema (column types) in your serverless database.

How do I find performance recommendations?

After configuring the QPI library by running itroad map, you will be able to query the views created with this script. You can read the recommendations through queriesqpi.recommendationsTo see:

Optimize database schema on serverless SQL pools using QPI library (1)

This view examines your tables, views, and columns and provides some tips for optimizing your schema. You will see the name of the recommendation, the classification scheme (higher rank = greater performance impact), the name of the object (view or table name), and the name of the column (if any) where you can apply the recommendation . In the last column you can see the reason why the recommendation was generated.

recommendation types

Each recommendation has a type because you can apply different optimizations in different cases. The following sections explain the most common recommendation types.

(Video) Azure Synapse Serverless vs Dedicated SQL Pool

Optimize column type

Column types should be minimized so that the query optimizer can more accurately estimate the resources for the query (you don't want to run SELECT TOP 10 on 11 VMs because the query will overestimate). You should try to avoid large types like VARCHAR(MAX) or VARCHAR(8000) whenever possible. The QPI library examines all types and finds those that may be too large (for example, LOB types or large VARCHAR types). Look at those columns and see if you can use the smaller types.

For example, suppose (bod, demographics, population) appears in the recommendation. You should open the dbo.demographics view definition and see if you can change the column types:

CREATE OR ALTER VIEW dbo.demographics ASSELECT *FROM OPENROWSET( BULK 'https://mystorage.blob.core.windows.net/csv/*.csv', FORMAT = 'CSV' )WITH (country_code VARCHAR (5), population small) like r

The population column is bigint, but perhaps it could be changed to smallint.

Optimize Key Column Type

There are some columns that can be used in join conditions (like primary and foreign keys in standard databases). These columns should have more restrictions. When joining tables using column equality, the columns must have a maximum length of 4 to 8 bytes. Joining tables with long string columns degrades performance. This recommendation shows the schema, table, and column that the QPI library believes can be used as the key in the join condition, and recommends using smaller types where possible.

Optimize String Filter

You might have some string columns that are used to filter data. Parquet, Delta, and CosmosDB data filtering is much faster when the collation is Latin1_General_100_BIN2_UTF8. Apply this grouping to the string filter column if your queries split data using this column as a predicate.

(Video) Synapse Espresso: Using QPI Library to Troubleshoot Performance Issues in Serverless SQL Pools

For example, suppose (dbo,demographics,country_code) appears in the recommendation. You need to open the dbo.demographics view definition and add the COLLATE Latin1_General_100_BIN2_UTF8 clause after the type definition:

CREAR O ALTERAR VISTA dbo.demographics ASSELECT *FROM OPENROWSET( BULK 'https://storage.blob.core.windows.net/d', FORMAT = 'DELTA' )WITH (country_code VARCHAR (5) COLLATE Latin1_General_100_BIN2_UTF8, população smallint) como r

Make this change only if you are sure that you are filtering data using this column. Forcing the BIN2 collation can produce unexpected results for sort operations because the BIN2 collation uses character code collation.

Use better column types

You may have some numeric, date, or GUID columns formally declared as VARCHAR, VARBINARY, etc. When you see this recommendation, look at the column type and think that it is the wrong type assigned to the column.

For example, suppose that (dbo,demographics,country_hash) and (dbo,demographics,date_modified) appear in the recommendation. You should open the dbo.demographics view definition, see if you need to change the types:

CREAR O ALTERAR VISTA dbo.demographics ASSELECT *FROM OPENROWSET( BULK 'https://storage.blob.core.windows.net/d', FORMAT = 'DELTA' )WITH (country_hash VARCHAR (8000), date_modified VARCHAR(30) ) COMO r

A hash value is usually a shorter string, so the VARCHAR(8000) type can be an overload. If the column is named date_modified, it should probably be represented as datetime2 instead of the string.

(Video) Synapse Espresso: How to Optimize Repeating Queries Using CETAS?

Use VARCHAR UTF-8 type

String values ​​in some formats, such as Delta or Parquet files, are encoded using the Unicode UTF-8 encoding. When reading string values ​​from these files, you must represent them as a VARCHAR type with some UTF-8 encoding. You will see "User type VARCHAR UTF-8"Recommendationif you need to change your NVARCHAR columns to write VARCHAR when querying Parquet, Delta, CosmosDB or CSV files with UTF-8 encoding. String values ​​encoded in UTF-8 must be represented as a VARCHAR type with UTF8 collation.

For example, suppose (dbo,demographics,stateName) appears in the recommendation. You should open the dbo.demographics view definition, see if you need to change the type:

CREATE OR ALTER VIEW dbo.demographics ASSELECT *FROM OPENROWSET( BULK 'https://storage.blob.core.windows.net/data/*.parquet', FORMAT='PARQUET' )WITH ([stateName] NVARCHAR (50) , --> replaces with VARCHAR(50) Latin1_General_100_BIN2_UTF8 [population] bigint) AS [r]

The Parquet format contains UTF-8 encoded strings, so the string column type must be a VARCHAR type with some UTF8 collation instead of NVARCHAR.

For more information on UTF-8 collations, seethis post.

Use or type NVARCHAR

CSV files can be saved as Unicode files with UTF-16 encoding. In this case, you must represent the string values ​​of the UTF-16-encoded files as NVARCHAR types. You will see the recommendation "Use NVARCHAR types" when defining the VARCHAR type for the UTF-16 file. If you see this recommendation, you should change your VARCHAR columns to write NVARCHAR while querying UTF-16 encoded CSV files. String values ​​encoded in UTF-16 must be represented as an NVARCHAR type with UTF8 collation.

(Video) How to dynamically create SQL Serverless views from Azure Synapse Pipelines

Replace the table with a split view

One important recommendation is to partition your data sets. At the time of writing, external tables do not support queries against partitioned data sets. The QPI library finds these tables and offers a recommendation to replace these tables with the onessplit views.

remove duplicate references

Sometimes you may have two tables or views referencing the same data in memory. While this is not a bug, it is an overhead because you have to keep both schemas in sync and adjust both schemas. It can happen that some queries run fast and others run slowly just because the second group references the table, which does not have optimized types like the first group. If possible, avoid using multiple tables that reference the same data, and establish a 1:1 relationship between tables/views and data.

Diploma

The QPI library is an easy-to-use tool to identify potential problems in your database schema that could affect the performance of your queries. With this library, you can view the recommendations and apply the changes described in the recommendations to optimize your database schema.

Videos

1. Using CI/CD for Serverless SQL Pools in Azure Synapse Analytics
(Azure Synapse Analytics)
2. 6. Analyze data with Server less Spark Pool in Azure Synapse Analytics
(WafaStudies)
3. Synapse Espresso: Introduction into Synapse Serverless SQL Pools
(Azure Synapse Analytics)
4. Synapse Espresso: The Importance of Statistics in Serverless SQL Pools
(Azure Synapse Analytics)
5. Jakub Wawrzyniak: If it's not performing well… Serverless SQL Pools practices
(Data Toboggan)
6. Optimize price-performance using Azure SQL Database serverless | INT122A
(Microsoft Developer)

References

Top Articles
Latest Posts
Article information

Author: Lilliana Bartoletti

Last Updated: 18/05/2023

Views: 5991

Rating: 4.2 / 5 (53 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Lilliana Bartoletti

Birthday: 1999-11-18

Address: 58866 Tricia Spurs, North Melvinberg, HI 91346-3774

Phone: +50616620367928

Job: Real-Estate Liaison

Hobby: Graffiti, Astronomy, Handball, Magic, Origami, Fashion, Foreign language learning

Introduction: My name is Lilliana Bartoletti, I am a adventurous, pleasant, shiny, beautiful, handsome, zealous, tasty person who loves writing and wants to share my knowledge and understanding with you.