User Community Service Desk Downloads
If you can't find the product or version you're looking for, visit support.ataccama.com/downloads

Azure Synapse Pushdown Processing

Pushdown processing in Azure Synapse can be utilized both for profiling and for data quality evaluation.

Once pushdown is enabled for profiling, all catalog items under that connection are profiled using pushdown. For more information about pushdown profiling, see Pushdown profiling.

For DQ evaluation using pushdown, it is possible to exclude selected catalog items, in the case that any rules used in the catalog item are not supported in pushdown. You can disable pushdown for DQ evaluation at a catalog item level or for individual monitoring projects. For more information about DQ evaluation in pushdown, see Pushdown for DQ evaluation.

Pushdown profiling

How to enable pushdown profiling on Azure Synapse connections

This section covers how to enable pushdown profiling. Pushdown profiling is enabled at the data source Connection level.

  1. Enable pushdown processing on a Synapse connection via the connection configuration details. Follow the instructions relevant to your use case before proceeding to step 2:

    Instructions for new connections
    1. In Knowledge Catalog > Sources, select your Synapse source.

      If you need to create a new source, see Create a source in Relational Database Connection.
    2. Select Add connection.

    3. In Connection type, select Azure Synapse Analytics.

    4. Provide the following:

      • Name: A meaningful name for your connection. This is used to indicate the location of catalog items.

      • Description (Optional): A short description of the connection.

      • Dpe label (Optional): Assign the processing of a data source to a particular data processing engine (DPE) by entering the DPE label assigned to the engine.

      • JDBC: A JDBC connection string pointing to the IP address or the URL where the data source can be reached.

        The connection string should take the form:

        jdbc:sqlserver://synapse-pushdown.sql.azuresynapse.net:1234;database=name;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.sql.azuresynapse.net;loginTimeout=30;
    5. Proceed to step 2.

    Instructions for existing connections
    1. In Knowledge Catalog > Sources, select your Synapse source.

    2. Select the Connections tab, and click the connection name.

    3. Select the three dots menu in the upper-right corner of the screen, and then select Edit.

    4. Proceed to step 2.

  2. In the Pushdown processing section, enable Profile data using pushdown processing.

  3. You now need to choose whether you want to run term detection when pushdown processing is enabled, or disable term detection.

    Term detection jobs (domain detection and fingerprints for AI term suggestions) can’t run in pushdown, so if you run term detection with pushdown enabled it runs on a data sample which is sent to ONE.
    • Select Run term detection on a data sample in ONE to trigger detection jobs alongside profiling. Otherwise, select Disable term detection.

      If you disable term detection, no terms are applied during profiling or documentation flows, and no term suggestions are shown for assets from this connection.

How does it work?

After enabling pushdown profiling for your Synapse connection, profiling for all catalog items under that connection is processed in pushdown.

Standard profiling fetches all catalog item data to Ataccama Data Processing Engine (DPE), iterates rows, and executes the profiling locally. Pushdown profiling instead creates complex specialized database-specific SQL queries that are executed inside Synapse, and only the profiling results are returned to Ataccama DPE.

By default, when you start profiling, a single SQL query is generated per column that combines all the statistics Ataccama requires for that column. The total number of queries for the table is determined by the number of columns in the table: it is possible to minimize the number of queries by merging queries for multiple columns into a single, bigger (highly-optimized) SQL query. See Pushdown profiling configuration.

Pushdown profiling only applies to full profiling. Sample profiling always runs on Ataccama DPEs.

Pushdown profiling configuration

The default pushdown configuration creates one SQL query for each table column and retrieves all supported statistics at once.

It is possible to modify the number of DPE threads and the number of columns processed by a single query and to split standard and frequency analysis.

Property Data type Description

com.ataccama.synapse.profiling.parallelStep

Number

The number of columns processed at once when fetching basic statistics. This property is irrelevant if mergedStatistics=true. Default value: 1.

com.ataccama.synapse.profiling.maxCombinedColumns

Number

The number of columns processed at once when fetching frequencies or merged statistics.

Default value: 1

com.ataccama.synapse.profiling.mergedStatistics

Boolean

Get both basic statistics and frequencies at once (for a group of columns).

Default value: true.

com.ataccama.synapse.profiling.corePoolSize

Number

The number of queries processed at once. This setting limits the number of queries sent from DPE to Synapse for profiling. It is applied globally for the whole DPE and the number of allowed queries is shared between all Synapse Pushdown profiling jobs.

Default value: 32.

com.ataccama.synapse.profiling.maxStringLength

Number

The maximum length of data that will be processed in each cell (data types: STRING, JSON, ARRAY, STRUCT). The property should be set only when the data is too big to process.

Example value: 100

In self-managed deployments these properties can be found in dpe/etc/application.properties. For Ataccama Cloud deployments this needs to be done by Ataccama: get in touch with your Ataccama Customer Success Manager if this is required.

Pushdown for DQ evaluation

After enabling DQ evaluation in pushdown for your Synapse connection, processing of all supported rules on catalog items under that connection runs in pushdown. Only the DQ results and invalid samples, if configured, are returned to ONE.

To enable pushdown for DQ evaluation, see How to enable pushdown for DQ evaluation on Synapse connections.

However, there are limitations to the type of rules which can be processed in pushdown.

For this reason, it is possible to turn off pushdown for DQ evaluation at both the catalog item level and the monitoring project level.

How to enable pushdown for DQ evaluation on Synapse connections

  1. Enable pushdown processing on a Synapse connection via the connection configuration details. Follow the instructions relevant to your use case before proceeding to step 2:

    Instructions for new connections
    1. In Knowledge Catalog > Sources, select your Synapse source.

    2. Select Add connection.

    3. In Connection type, select Azure Synapse Analytics.

    4. Provide the following:

      • Name: A meaningful name for your connection. This is used to indicate the location of catalog items.

      • Description (Optional): A short description of the connection.

      • Dpe label (Optional): Assign the processing of a data source to a particular data processing engine (DPE) by entering the DPE label assigned to the engine.

      • JDBC: A JDBC connection string pointing to the IP address or the URL where the data source can be reached.

        The connection string should take the form:

        jdbc:sqlserver://synapse-pushdown.sql.azuresynapse.net:1234;database=name;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.sql.azuresynapse.net;loginTimeout=30;
    5. Proceed to step 2.

    Instructions for existing connections
    1. In Knowledge Catalog > Sources, select your Synapse source.

    2. Select the Connections tab, and click the connection name.

    3. Select the three dots menu in the upper-right corner of the screen, and then select Edit.

    4. Proceed to step 2.

  2. In the Pushdown processing section, enable Profile data using pushdown processing. It is not possible to enable pushdown for DQ evaluation without first enabling pushdown profiling.

  3. . You now need to choose whether you want to run term detection when pushdown processing is enabled, or disable term detection.

    Term detection jobs (domain detection and fingerprints for AI term suggestions) can’t run in pushdown, so if you run term detection with pushdown enabled it runs on a data sample which is sent to ONE.
    • Select Run term detection on a data sample in ONE to trigger detection jobs alongside profiling. Otherwise, select Disable term detection.

      If you disable term detection, no terms are applied during profiling or documentation flows, and no term suggestions are shown for assets from this connection.
  4. Next, enable Evaluate Data Quality using pushdown processing.

    In order to use pushdown capabilities a new schema must be created in your Synapse database for temporary storage of Ataccama data required for processing. This is used for caching lookup keys only. Continue following the steps to create your schema.
    Set up operational database
  5. To set up your working database schema:

    1. In Schema name, provide a name for the schema.

    2. Select Use operational credentials for uploading files if the connection credentials are not sufficient to create new tables in the schema. You need to add write credentials in the next step.

      1. Select + Add credentials and follow the instructions in Add credentials.

        This option is only available when you are setting up new connections.
    3. Select Download file

    4. Run this script in your chosen database management tool. This creates necessary functions within the schema defined in step 5.

      Example of script run in dbeaver
This new schema should only be used for Ataccama processing data. Don’t use it for data storage.

How does it work?

Ataccama Data Quality Pushdown translates Ataccama Expressions into native SQL queries. For each individual catalog item, Ataccama Data Processing Engine (DPE) generates a single SQL query that incorporates all the rules applied to the item, converted to SQL.

If you use lookup data in the rules, Ataccama DPE uploads those lookups into operational databases and provides a JOIN operation between the lookup table and the processed catalog item in order to match the values. For small lookups (where the lookup key column is less than 100 kB after values have been normalized), the lookup data is embedded directly into the SQL instead of providing a JOIN operation, optimizing the processing.

The complexity of the generated query might vary significantly based on the DQ rules assigned to the catalog item. If your monitoring project has invalid samples enabled, Ataccama DPE generates and runs another SQL query that returns a sample of invalid records to be stored in ONE.

Enable and disable pushdown on the catalog item level

The following information is relevant for catalog items and DQ evaluation in the Data Catalog.

Once pushdown for DQ evaluation has been enabled at the connection level, it is possible to turn this off on the catalog item level. This is useful if you have rules applied to the catalog item which are not supported in pushdown.

To disable pushdown processing for DQ evaluation:

  1. In Knowledge Catalog > Catalog Items, select your catalog item.

  2. Select the three dots menu and then select Edit.

  3. Clear Enable Data Quality evaluation using pushdown.

    disable pushdown on catalog item level

If rules not supported in pushdown are applied to items in the catalog, these rules are skipped during DQ evaluation. For more information, see Supported rules.

Enable and disable pushdown on the monitoring project level

The following information is relevant for Monitoring Projects and the catalog items added to those projects.

Once pushdown for DQ evaluation has been enabled at the connection level, it is possible to turn this off on the monitoring project level. This is useful if you have rules which are not supported in pushdown applied to items in the monitoring projects.

To disable pushdown processing for monitoring projects:

  1. In Data Quality > Monitoring Projects, select your monitoring project.

  2. Select the three dots menu and then Advanced settings.

  3. In Pushdown processing, use the toggles to disable pushdown processing for any catalog items containing unsupported rules.

enable or disable pushdown in monitoring project

If rules not supported in pushdown are applied to items in the monitoring project, the monitoring project can’t be run until the rule has been removed or pushdown processing has been disabled for the affected catalog items.

remove rule or disable pushdown

Supported rules

The following rules are supported in Synapse pushdown:

  • All rules created using the Condition Builder, as all expression functions available in the condition builder are supported by pushdown. This includes aggregation rules built using the condition builder.

  • Parametric rules.

  • Advanced Expression rules, rules generated using the Ask AI feature, and rules using variables, as long as they include only Supported functions and operators.

Supported functions and operators

The data quality processing configurations available in ONE are translated to Synapse SQL language, either by utilizing native SQL functions or creating new functions. For this reason, there are some rule functions which are not supported.

You are notified about rules not supported in pushdown either during rule creation or when trying to add rules to attributes.

Examples
rule not supported in pushdown in any source
Figure 1. Rule creation - rule not supported in pushdown in any source
rule not supported in pushdown in some sources
Figure 2. Rule creation - rule not supported in pushdown in some sources
can’t apply rule in catalog
Figure 3. Apply rule in catalog - rule not supported
can’t apply rule in monitoring project
Figure 4. Apply rule in monitoring project - rule not supported

The full list of supported functions can be found here.

Expand to see list
  • eraseSpacesInNames

  • removeAccents

  • trashNonDigits

  • trashNonLetters

  • toDateTime

  • toString

  • toDate

  • left

  • right

  • count

  • math.ceil, math.ceiling

  • math.floor

  • math.longCiel, math.longCeiling

  • math.longFloor

  • math.round

  • math.srq

  • lower

  • replace

  • is null

  • is not null

  • substr

  • trim

  • trimLeft

  • trimRight

  • upper

  • squeezeSpaces

  • dateAdd

  • dateDiff

  • datePart

  • dateTrunc

  • getDate

  • toFloat

  • toInteger

  • toLong

  • + (concatenate)

  • now

  • today

  • containsWord

  • indexOf

  • lastIndexOf

  • length

  • avg

  • avgif

  • maximum

  • maximumif

  • minimum

  • minimumif

  • sum

  • sumif

  • in

  • is in

  • is not in

  • not in

  • <

  • < =

  • <>, !=

  • =, ==

  • >

  • > =

  • AND

  • OR

  • is

  • is not

  • isNumber

  • isInFile

  • isNotInFile

  • is null

  • is not null

  • addition

  • subtraction

  • multiplication

  • division

Ataccama DPEs handle the translation of these configurations to Synapse functions.

Limitations and known issues

  • Pushdown profiling:

    • Pushdown profiling is not supported for virtual catalog items (VCIs).

    • Frequency group analysis statistics cannot be returned when pushdown profiling is used.

  • Pushdown for DQ evaluation:

    • Pushdown for DQ evaluation is not supported for VCIs or post-processing components.

    • Some rule types and functions are not supported, these are:

If rules not supported in pushdown are applied to items in the data catalog, these rules are skipped during DQ evaluation. If rules not supported in pushdown are applied to items in a monitoring project, the monitoring project can’t be run until the rule has been removed or pushdown processing has been disabled for the affected catalog items.

Was this page useful?