With serverless SQL pools, you can query data stored in Azure Data Lake Storage, Cosmos DB analytical storage, or Dataverse without having to import your data into database tables. It is very important to apply best practices and tune the schema/queries for optimal performance. Some of the mandatory best practices are:
- Make sure your client (PowerBI), server, and storage tier are in the same region.
- If you are using ADLS, make sure that the files are of a consistent size in the range of 100 MB to 10 GB.
- Be sure to use schemas and ideal types. Organize your tables/views in star/snowflake pattern, pre-added measures, avoid using VARCHAR(MAX) type for smaller strings, don't use NVARCHAR when using UTF-8 encoded data, use BIN2_UTF8 collation for strings in which filtration is found.
You can find more like thisBest practices here. These best practices are very important because some issues can cause performance degradation. You'll be surprised how applying some of these best practices can improve the performance of your workload.
The last point related to schema optimization is sometimes difficult to verify. You would have to look at your schema, examine all the columns, and figure out what to optimize for. If you have a large schematic, this may not be an easy task. But you can make your life easier if you use it.the QPI helper libraryit can detect schema problems for you.
QPI Library
Query Performance Insights (QPI) is a free and open source set of T-SQL views that you can configure on your database and get useful information about your database. You can install this library by simply running itroad map. after running thisroad map(You can manually check this and confirm that it only points to some DMVs in the system) You can use the views in the QPI schema to find recommendations to tune your database or fix your queries. In this article, I'll talk about the view that returns recommendations explaining how to optimize the schema (column types) in your serverless database.
How do I find performance recommendations?
After configuring the QPI library by running itroad map, you will be able to query the views created with this script. You can read the recommendations through queriesqpi.recommendationsTo see:
This view examines your tables, views, and columns and provides some tips for optimizing your schema. You will see the name of the recommendation, the classification scheme (higher rank = greater performance impact), the name of the object (view or table name), and the name of the column (if any) where you can apply the recommendation . In the last column you can see the reason why the recommendation was generated.
recommendation types
Each recommendation has a type because you can apply different optimizations in different cases. The following sections explain the most common recommendation types.
Optimize column type
Column types should be minimized so that the query optimizer can more accurately estimate the resources for the query (you don't want to run SELECT TOP 10 on 11 VMs because the query will overestimate). You should try to avoid large types like VARCHAR(MAX) or VARCHAR(8000) whenever possible. The QPI library examines all types and finds those that may be too large (for example, LOB types or large VARCHAR types). Look at those columns and see if you can use the smaller types.
For example, suppose (bod, demographics, population) appears in the recommendation. You should open the dbo.demographics view definition and see if you can change the column types:
CREATE OR ALTER VIEW dbo.demographics ASSELECT *FROM OPENROWSET( BULK 'https://mystorage.blob.core.windows.net/csv/*.csv', FORMAT = 'CSV' )WITH (country_code VARCHAR (5), population small) like r
The population column is bigint, but perhaps it could be changed to smallint.
Optimize Key Column Type
There are some columns that can be used in join conditions (like primary and foreign keys in standard databases). These columns should have more restrictions. When joining tables using column equality, the columns must have a maximum length of 4 to 8 bytes. Joining tables with long string columns degrades performance. This recommendation shows the schema, table, and column that the QPI library believes can be used as the key in the join condition, and recommends using smaller types where possible.
Optimize String Filter
You might have some string columns that are used to filter data. Parquet, Delta, and CosmosDB data filtering is much faster when the collation is Latin1_General_100_BIN2_UTF8. Apply this grouping to the string filter column if your queries split data using this column as a predicate.
For example, suppose (dbo,demographics,country_code) appears in the recommendation. You need to open the dbo.demographics view definition and add the COLLATE Latin1_General_100_BIN2_UTF8 clause after the type definition:
CREAR O ALTERAR VISTA dbo.demographics ASSELECT *FROM OPENROWSET( BULK 'https://storage.blob.core.windows.net/d', FORMAT = 'DELTA' )WITH (country_code VARCHAR (5) COLLATE Latin1_General_100_BIN2_UTF8, população smallint) como r
Make this change only if you are sure that you are filtering data using this column. Forcing the BIN2 collation can produce unexpected results for sort operations because the BIN2 collation uses character code collation.
Use better column types
You may have some numeric, date, or GUID columns formally declared as VARCHAR, VARBINARY, etc. When you see this recommendation, look at the column type and think that it is the wrong type assigned to the column.
For example, suppose that (dbo,demographics,country_hash) and (dbo,demographics,date_modified) appear in the recommendation. You should open the dbo.demographics view definition, see if you need to change the types:
CREAR O ALTERAR VISTA dbo.demographics ASSELECT *FROM OPENROWSET( BULK 'https://storage.blob.core.windows.net/d', FORMAT = 'DELTA' )WITH (country_hash VARCHAR (8000), date_modified VARCHAR(30) ) COMO r
A hash value is usually a shorter string, so the VARCHAR(8000) type can be an overload. If the column is named date_modified, it should probably be represented as datetime2 instead of the string.
Use VARCHAR UTF-8 type
String values in some formats, such as Delta or Parquet files, are encoded using the Unicode UTF-8 encoding. When reading string values from these files, you must represent them as a VARCHAR type with some UTF-8 encoding. You will see "User type VARCHAR UTF-8"Recommendationif you need to change your NVARCHAR columns to write VARCHAR when querying Parquet, Delta, CosmosDB or CSV files with UTF-8 encoding. String values encoded in UTF-8 must be represented as a VARCHAR type with UTF8 collation.
For example, suppose (dbo,demographics,stateName) appears in the recommendation. You should open the dbo.demographics view definition, see if you need to change the type:
CREATE OR ALTER VIEW dbo.demographics ASSELECT *FROM OPENROWSET( BULK 'https://storage.blob.core.windows.net/data/*.parquet', FORMAT='PARQUET' )WITH ([stateName] NVARCHAR (50) , --> replaces with VARCHAR(50) Latin1_General_100_BIN2_UTF8 [population] bigint) AS [r]
The Parquet format contains UTF-8 encoded strings, so the string column type must be a VARCHAR type with some UTF8 collation instead of NVARCHAR.
For more information on UTF-8 collations, seethis post.
Use or type NVARCHAR
CSV files can be saved as Unicode files with UTF-16 encoding. In this case, you must represent the string values of the UTF-16-encoded files as NVARCHAR types. You will see the recommendation "Use NVARCHAR types" when defining the VARCHAR type for the UTF-16 file. If you see this recommendation, you should change your VARCHAR columns to write NVARCHAR while querying UTF-16 encoded CSV files. String values encoded in UTF-16 must be represented as an NVARCHAR type with UTF8 collation.
Replace the table with a split view
One important recommendation is to partition your data sets. At the time of writing, external tables do not support queries against partitioned data sets. The QPI library finds these tables and offers a recommendation to replace these tables with the onessplit views.
remove duplicate references
Sometimes you may have two tables or views referencing the same data in memory. While this is not a bug, it is an overhead because you have to keep both schemas in sync and adjust both schemas. It can happen that some queries run fast and others run slowly just because the second group references the table, which does not have optimized types like the first group. If possible, avoid using multiple tables that reference the same data, and establish a 1:1 relationship between tables/views and data.
Diploma
The QPI library is an easy-to-use tool to identify potential problems in your database schema that could affect the performance of your queries. With this library, you can view the recommendations and apply the changes described in the recommendations to optimize your database schema.