Configure Documentation Flow
You can configure which automatic actions are performed by each documentation flow option (Import, Discover, Document, Sweep), or create a custom one.
How documentation flows work
The documentation flows are optimized to deliver you the most important information as quickly as possible. To achieve this, assets are handled with different priorities depending on a set of predefined criteria.
The relevancy of each asset is decided based on the following conditions and in the following order:
| Out of the default documentation flows, only the Document flow configuration has these configuration properties defined. |
-
Catalog items whose metadata contains at least one value from the reference list provided are fully profiled. Such assets are fully profiled before sample profiling of other assets starts.
If there are no matches, the asset is processed using sample profiling instead, unless its type is defined in the
fullProfileOnlycondition. For example, in the default Document flow configuration, file catalog items can only be fully profiled. -
Catalog items with at least one glossary term applied are fully profiled. Higher priority is given to catalog items whose total number of unique glossary terms is equal or greater than a specific number provided in the configuration.
By default, full profiling first runs on assets with at least two unique glossary terms.
-
Whether DQ evaluation is performed on a catalog item depends on the number of unique glossary terms linked to DQ rules that are applied to that asset. Higher priority is assigned to assets with two or more such glossary terms.
By default, if an asset has at least one glossary term with an associated DQ rule, DQ evaluation is automatically initiated after full profiling.
Configure data discovery
Configure actions for data discovery documentation flows (Import, Discover, Document).
The following actions can be added to the flow configuration:
-
IMPORT: Metadata import. This has to be the first step of the documentation flow. -
SAMPLE_PROFILING: Sample profiling. Uses settings of the Sample profiling configuration defined in Global Settings > Profiling. -
PROFILING: Full profiling. Uses settings of the Full profiling configuration defined in Global Settings > Profiling. -
DQ: DQ evaluation. -
ANOMALY_DETECTION: Anomaly detection. Uses anomaly detection settings (sensitivity and model) of the Full profiling configuration defined in Global Settings > Profiling.
| Use the Document flow configuration as a template for your new documentation flow. |
The conditions are determined by the following properties:
-
referenceList: A list of terms used to prioritize catalog items for full profiling. Matched against catalog item and attribute names (not case sensitive). -
profilingTermCountThreshold: The minimum number of glossary terms applied to a catalog item for it to be fully profiled in a documentation flow. -
profilingTermCountHighPriorityThreshold: The minimum number of glossary terms applied to a catalog item for it to receive higher priority during full profiling in a documentation flow. -
dqEvalTermCountThreshold: The minimum number of glossary terms with associated DQ rules applied to a catalog item for DQ evaluation to run after full profiling. -
dqEvalTermCountHighPriorityThreshold: The minimum number of glossary terms with associated DQ rules applied to a catalog item for it to receive higher priority during DQ evaluation in a documentation flow. -
fullProfileOnly: Catalog item types that can only be fully profiled (for example,fileCatalogItem). For all available values, go to Global Settings > Metadata model: filter bycatalogItemfor the list of catalog item types, then switch to the Model Graph tab to see the hierarchy (follow the Extends connection type).
For more information, see How documentation flows work.
| The Configuration field is a blank text field without any input validation. As the configuration needs to be provided in JSON format, make sure there are no errors in the syntax, otherwise the flow fails. |
Configure catalog synchronization
Configure the Sweep documentation flow to keep your catalog synchronized with data sources.
The following action can be added to the flow configuration:
-
SWEEP: Sweep documentation flow. Configure this action on its own.Combining it with other actions like profiling or DQ evaluation is not meaningful.
When you schedule the Sweep documentation flow (see Schedule catalog synchronization), you must also configure the following property:
-
scheduledSweepDeleteLastModifiedThreshold: Enables automatic deletion of obsolete catalog items when running the Sweep flow on a schedule. The threshold value defines how much time must pass since a catalog item was last modified or profiled before it is automatically deleted. Format:d(days),m(months),y(years).Sample configuration{"actions": ["SWEEP"], "scheduledSweepDeleteLastModifiedThreshold": "7d"}A catalog item is automatically deleted only when both of the following conditions are met:
-
The item no longer exists in the data source (deleted or renamed).
-
The item has not been modified and has not been profiled within the threshold period.
As a fail-safe, if a connection fails during the sweep operation, only catalog items from successful connections are deleted. Items from failed connections are skipped to prevent accidental data loss.
-
Manage documentation flows
View documentation flow settings
To view the documentation flow settings, go to Global Settings > Documentation Flow.
For each flow, the following information is shown:
-
Name: The name of the flow.
-
Order: The order in which the option is shown in the Documentation Flow menu.
-
Configuration: The documentation flow settings, that is, which actions are automatically performed on the data and the conditions under which they are initiated.
Create a new documentation flow
To create a new documentation flow:
-
In Global Settings > Documentation flow, select Create.
-
Provide the following information:
-
Name: A unique name for the documentation flow.
-
Description: A description of the documentation flow and when to use it.
-
Order: The position at which the flow is displayed in the Documentation Flow menu.
-
Configuration: A list of actions that are performed within the flow and the associated conditions. See Configure data discovery and Configure catalog synchronization for available options.
-
-
Select Save and publish your changes to have the flow displayed in the Documentation Flow menu.
Edit documentation flow
To edit the configuration of an existing documentation flow, in Global Settings > Documentation flow, open the required configuration and in the three dots menu select Edit.
Update the configuration as needed: see Create a new documentation flow for details about the options available. Next, select Save and publish your changes.
Delete documentation flow
To delete a documentation flow, in Global Settings > Documentation flow, open the required configuration and in the three dots menu select Delete. Alternatively, select one or more flows from the listing and in the ribbon that appears select Delete.
Confirm your choice when prompted and publish your changes.
Was this page useful?