User Community Service Desk Downloads
If you can't find the product or version you're looking for, visit support.ataccama.com/downloads

Run Documentation Flow

After you connect to a data source, run data discovery to get a better picture of the data you’re working with.

There are three documentation flows avaiable for data discovery:

Import

Imports all catalog items from a source and analyzes their metadata without accessing the data. Running this flow populates the metadata information for each catalog item (displayed on the Overview tab).

Discover

The fastest way to dig deeper into your data. This flow imports metadata and runs sample profiling on all catalog items in a source, which allows you to see the relationships between the assets and preview the data.

Document

The most complex documentation flow. The flow imports metadata and runs sample profiling on all catalog items, then identifies the most relevant assets and analyzes them using full profiling, and DQ evaluation, and anomaly detection, giving you the most complete information about the data source.

In addition, you can customize the default documentation flows as needed, or create a custom one (see Configure Documentation Flow).

If catalog items which are included in monitoring projects are profiled as part of the Discover or Document flow, any detected anomalies will be visible in the monitoring project results.

We do not recommend making changes to catalog item metadata after it’s been processed using a documentation flow. If attributes (or tables) are renamed, modified, or removed, subsequent documentation attempts might fail.

However, if such changes are required, first delete the catalog item and all the related objects, such as monitoring projects, then rerun the documentation flow to import and profile the data again.

Ad hoc documentation flow

To run a documentation flow on an ad hoc basis:

A whole source

  1. In Data Catalog > Sources, select the required source and from the document menu, select the flow you want to run: Import, Discovery, or Document. Confirm your choice when prompted (Import or Proceed).

    Run documentation flow

Specific assets

If your data source is a relational database, you can choose the schemas and/or tables that you want to analyze. If your data source is a file system, use this to import only specific files.

  1. In Data Catalog > Sources, select the required source and switch to the Connections tab.

  2. Open the [source name] connection browser, select all the required assets and, in the ribbon that appears, select the flow you want to run: Import to catalog, Discover, or Profile. Confirm your choice when prompted (Import or Proceed).

    To select specific tables from a schema, select the schema first, then choose the required tables.
    Run flow on selected assets

While the documentation flow is running, you can view the details by selecting Show details from the source detailed view.

Documentation flow details

Alternatively, track the progress using the Processing Center. See Monitor flow progress.

In addition, you can also view more details about the status of a particular source by going to Sources > [your data source] > Connection Details and selecting Show details.

Documentation flow progress

When the documentation flow finishes, it remains in the Running status until it is published. Make sure to publish changes.

To view a list of imported (Import flow) or profiled (Discover and Document flows) catalog items, open the Catalog Items tab of the source.

Imported and profiled items in the Catalog Items tab
Once the documentation flow is completed, create tasks that should be performed next on the analyzed assets. For instance, create a task detailing which catalog items need to be fully profiled or a task suggesting how to address the anomalies detected.

Next steps

Monitor flow progress

More information about the jobs started by the flow can be viewed at any time in the Processing Center.

To access it, select Processing Center from the left navigation menu and then Open Processing Center.

Track flow progress

Once the job is successfully completed, its Status changes from RUNNING to FINISHED. Otherwise, the status is updated to FAILED and an error message is provided.

To view more details about a specific job, select the job directly from the list or locate it in the Processing Center.

If the job was started as part of a documentation flow, its Execution type is FLOW. For manually initiated jobs, the type is MANUAL.

View profiling results

Open the source and select a catalog item to learn more about the state of your data (available from the Profile and DQ insights tab) and decide on the next steps for improving the data quality. For details, see Understand Profiling Results.

Was this page useful?