Frequently Used Steps

Ataccama products come with various steps and functions for constructing plan files. The algorithms and logic used for creating a plan file vary from project to project; an introduction to steps is provided in the following sections.

To learn more about functions, see Commonly Used Functions.

Steps: an overview

Steps can perform many types of functions, such as transforming, filtering and categorizing, and reading data. The following is an overview of some of the most frequently used steps and their functions.

A complete description of steps and their usage can be found in Product Help (Help > Help Contents in the main menu) under Steps.

Flow control steps

Icon	Step name	Step description
	Condition	Directs data flow. True: right, false: left.
	Filter	Directs data flow. True: out.
	Extract Filter	Directs data flow. True: right, all: left.
	Multiplicator	Multiplies data flow without modification.
	Trash	Discards data flow.
	Join	Works like SQL table join.
	Union	Works like SQL table union.
	Union Same	Like Union but applies only if the flows are exactly the same.
	Alter format	Adds or removes columns.

Icon

Step name

Step description

Condition

Directs data flow. True: right, false: left.

Filter

Directs data flow. True: out.

Extract Filter

Directs data flow. True: right, all: left.

Multiplicator

Multiplies data flow without modification.

Trash

Discards data flow.

Join

Works like SQL table join.

Union

Works like SQL table union.

Union Same

Like Union but applies only if the flows are exactly the same.

Alter format

Adds or removes columns.

Data parsing steps

Icon	Step name	Step description
	Regex Matching	Parses the input string based on a regular expression. See also Regular Expressions.
	Pattern Parser	Parses the input text based on the patterns provided. You have to define all components and optional validations against dictionaries.
	Guess Name Surname	A "predefined" version of Generic Parser used for parsing names.
	Strip Titles	Extracts strings found in the dictionary from the input. For example, it turns "James White PhD" into "James White", "PhD".
	Apply Replacements	Replaces values found in the input with their standardized value. Replaces even substrings, for example, "5th Ave" is transformed to "5th Avenue".
	Lookup	Lookup and validation against a dictionary.

Icon

Step name

Step description

Regex Matching

Parses the input string based on a regular expression. See also Regular Expressions.

Pattern Parser

Parses the input text based on the patterns provided. You have to define all components and optional validations against dictionaries.

Guess Name Surname

A "predefined" version of Generic Parser used for parsing names.

Strip Titles

Extracts strings found in the dictionary from the input. For example, it turns "James White PhD" into "James White", "PhD".

Apply Replacements

Replaces values found in the input with their standardized value. Replaces even substrings, for example, "5th Ave" is transformed to "5th Avenue".

Lookup

Lookup and validation against a dictionary.

Analysis steps

Icon Step name Step description

Icon	Step name	Step description
	Profiling	Comprehensive analysis written to a file (`.profile`).
	Character Group Analyzer	Calculates masks (example: digit to #, letter to A).
	Word Analyzer	Substitutes words found in reference dictionaries by symbols.
	Relation Analysis	Calculates the number of missing foreign keys for two source flows.
	Data Quality Indicator	Calculates statistics for a given set of business rules. Adds a set of Boolean flags to each record.

Profiling

Comprehensive analysis written to a file (.profile).

Character Group Analyzer

Calculates masks (example: digit to #, letter to A).

Word Analyzer

Substitutes words found in reference dictionaries by symbols.

Relation Analysis

Calculates the number of missing foreign keys for two source flows.

Data Quality Indicator

Calculates statistics for a given set of business rules. Adds a set of Boolean flags to each record.

Match and merge steps

Icon	Step name	Step description
	Unification	Assigns group IDs (client, candidate, unification roles). Can do the incremental process using the repository.
	Representative Creator	Creates a new record from the defined group (records already have group IDs). Can add calculated values into the original data flow.
	Simple Group Classifier	Calculates the quality of groups (A - for automatic processing, U - unique, M - for manual processing, C - for additional data cleansing).
	Unification Extended	Can run the match process in the mixed mode - online and batch in parallel.

Icon

Step name

Step description