User Community Service Desk Downloads
If you can't find the product or version you're looking for, visit support.ataccama.com/downloads

Run Profiling

After you connect to a data source, start understanding your data better by running data discovery on the source or profiling a particular catalog item.

Sample vs. full profiling: What’s the difference?

The following options are available for profiling:

Sample profiling

Analyzes a small sample of records from the selected catalog item as quickly as possible, in a performance-friendly way. This means calculating statistics, determining patterns, masks, and frequencies in your data, as well as applying rule and pattern-based domain detection to identify business domains.

Sample profiling without DQ evaluation processes 1% of your records, or at most 10,000 records.
Sample profile & DQ evaluation

Runs sample profiling followed by DQ evaluation.

Sample profiling with DQ evaluation processes 1% of your records, or at most 1,000,000 records.
Full profiling

Processes all records of the selected catalog item. This type of profiling also includes anomaly detection, which helps identify potential irregularities or corruptions in your data.

Full profile & DQ evaluation

Runs full profiling followed by DQ evaluation.

By default, all profiling options also run on partitions.

In addition, you can customize the default profiling configurations as needed, or create a custom one (see Configure Profiling).

Ad hoc profiling

If you run manual profiling on catalog items that are included in monitoring projects, any detected anomalies will also be visible in the monitoring project results.

To run profiling on an ad hoc basis:

A single catalog item

  1. In Data Catalog > Catalog Items, select the required catalog item (or attribute), and from the profiling menu, select the profiling you want to run.

    Profile single catalog item

Multiple catalog items

  1. In Data Catalog > Catalog Items, select the required catalog items from the list, and in the ribbon that appears, select the profiling you want to run.

    Profile multiple catalog items
    If the catalog items you want to analyze come from the same source, you can also run profiling from the source. In Data Catalog > Sources, select the required source, and from the Catalog Items tab, select the catalog items as needed and then the profiling option.

Profiling first imports the catalog item metadata, then processes the data. To track the progress, use the Processing Center. See Monitor profiling progress.

If the profiling was successful, navigate to the Profile & DQ insights tab of the relevant catalog item to see the results.

We do not recommend making changes to catalog item metadata after it’s been profiled. If attributes (or tables) are renamed, modified, or removed, subsequent profiling attempts might fail.

However, if such changes are required, first delete the catalog item and all the related objects, such as monitoring projects, then reimport the data and run profiling again.

Profiling on partitions

Profiling on partitions is supported only for metastore sources. It is also possible to work with partitions using SQL catalog items.

When working with data sources with partitioned tables, there are additional profiling configurations:

  • <Profiling configuration> of the last partition - Profile the last partition only.

    Last partition refers to the last partition in the full list of partitions, sorted in descending order.
  • <Profiling configuration> of custom partition - Profile a partition of your choice.

  • <Profiling configuration> - Profile each partition in the catalog item.

    The <Profiling configuration> refers to any of the available profiling configurations.

Schedule profiling

To run profiling following a particular schedule:

  1. In Data Catalog > Catalog Items, select the required catalog item, and in the three dots menu select Schedule.

    Schedule profiling
  2. Select Add Scheduled Event.

  3. Configure the following:

    • In Type, select Profiling Schedule.

    • If there are partitions, select which partitions to profile in Partition.

    • In Configuration, select from the available profiling configurations (full, sample, or custom).

  4. Define the schedule using either Basic or Advanced configuration:

    • For Basic configuration:

      Schedule profiling - basic configuration
      1. In Repeat, select from the list of options how often profiling should run.

      2. In At, specify the time (24-hour clock) at which the profiling should be run, and select from the list of available time zones.

      3. If required, select Queue action for later if the platform is not accessible at the scheduled time.

      4. In Valid from, define the date from which the schedule should be followed.

      5. If the schedule should have a finite end date, select Enable expiration and in Valid to define the date at which the schedule should no longer be effective.

    • For Advanced configuration:

      Schedule profiling - advanced configuration
      1. Set the schedule using Cron expression syntax. For more information, see Cron Expression Generator and Explainer.

      2. In Valid from, define the date from which the schedule should be followed.

      3. If the schedule should have a finite end date, select Enable expiration and in Valid to define the date at which the schedule should no longer be effective.

  5. Select Save and publish your changes.

You can see the scheduled jobs in the Processing center under Scheduled jobs.

Edit scheduled event

To edit, pause, or delete the scheduled event, select the required catalog item, and in the three dots menu select Schedule. In the three dots menu for the event you want to modify, select the appropriate action:

Edit scheduled events

If an event is enabled, you can see the next date and time when it will be executed.

Next steps

Monitor profiling progress

More information about profiling can be viewed at any time in the Processing Center.

To access it, select Processing Center from the left navigation menu and then Open Processing Center.

Track profiling progress

From the Base jobs menu, select either Metadata import jobs or Profiling jobs.

Once the job is successfully completed, its Status changes from RUNNING to FINISHED. Otherwise, the status is updated to FAILED and an error message is provided.

To view more details about a specific job, select the job directly from the list or locate it in the Processing Center.

If the job was started as part of a documentation flow, its Execution type is FLOW. For manually initiated jobs, the type is MANUAL.

View profiling results

Open the catalog item Profile & DQ insights tab to learn more about the state of your data and decide on the next steps for improving the data quality. For details, see Understand Profiling Results.

Was this page useful?