Create a Data Slice
Using data slices, you can evaluate a subset of a catalog item, based on selected criteria, rather than process the entire item with every job: this is optimal both from computational and cost perspectives.
Data slices are especially helpful when you are dealing with large tables which are updated frequently, as you can run DQ evaluation only for a relevant subset of the data (filtered by date or country, for example), instead of over millions or even billions of records.
While it is still possible to create SQL catalog items or virtual catalog items, the data slice functionality facilitates additional use cases which cannot be achieved using these methods. |
Using data slices you can:
-
Slice a large volume of data by a selected attribute, and run monitoring only on this relevant subset of data.
-
Easily define what the subset will be based on, for example, based on an attribute containing date information, or defined by country. This can be dynamic (that is, always based on the previous day or month) or ad hoc (based on a specific range).
-
See results based on the data subset, including historical results.
Currently, data slices can only be utilized within monitoring projects. For more information, see Next steps. |
To create a data slice:
-
In Data Catalog > Catalog Items, select the catalog item for which you want to create a data slice.
-
Using the three dots menu, select Create data slice.
You can also create data slices from the Data Slices section of the catalog item Data Structure tab or in monitoring projects where data slices are already active. -
Provide a name for the new data slice.
-
Select the Slice by attribute. This is the attribute by which the data will be filtered.
Only attributes of
Date
,Datetime
,String
,Integer
, andLong
data types can be selected. -
Specify the Slice by criteria, for example, the values you are interested in (in case of a
String
attribute), or a date range (in case ofDate
andDatetime
attributes).-
Date data type example
-
String data type example
Select the time range from the options provided, or select Custom to define your own. If you are using is from list
oris not from list
, use the plus icon to add additional values to the list. -
-
Save and publish.
If you have date information saved as, for example, a To do this:
|
Where data slices have been created in catalog items, you can see this information on the catalog item Overview tab. Look for Data Slices in the Summary section and select View to see details.
Alternatively, navigate to the Data Structure tab and find the Data Slices section.
It is also possible to create, edit, and delete slices from this section. Use the three dots menu to edit or delete slices.
If you edit a data slice already in use in a monitoring project, you see the following warning. It is not possible to delete data slices used in a monitoring project. |
Supported databases
The data slice functionality is supported for the following databases:
-
Amazon Aurora MySQL
-
Amazon Aurora PostgreSQL
-
Amazon Redshift
-
Azure Synapse Analytics
-
Azure SQL Database
-
Big Query
-
SAP HANA
-
MariaDB
-
MySQL
-
Oracle
-
PostgreSQL
-
MSSQL Server
-
Snowflake
-
Teradata
Next steps
Now you have created your data slice, you can use them within monitoring projects. This means that after adding the catalog items for which you created data slices to monitoring projects, you can specify that only selected data slices will be evaluated within that monitoring project. For more information, see Use data slices in monitoring projects in Monitoring Projects.
Was this page useful?