User Community Service Desk Downloads

Pattern Parser Step

Extract structured data from unstructured text attributes using pattern recognition.

This step is available for standalone plans, embedded plans, and transformation catalog items.

The Pattern Parser step parses unstructured or formatted text into separate attributes using AI-powered pattern generation. Use this step when you need to extract parts from addresses, product codes, log entries, or other formatted text.

Extracted text is saved as a string in new attributes. To convert them to other data types, use the Transform Data step after parsing.

How pattern parsing works

Pattern parsing involves three key concepts:

Segments

Named elements that make up a pattern. Each segment represents a distinct part of your data (for example, country code, area code, number). You can define up to 6 segments.

Patterns

Combinations of segments that describe valid data formats. A pattern specifies which segments appear and in what order. For example, {CountryCode}{AreaCode}{Number} or {AreaCode}{Number}.

Constraints

Rules that control which patterns are valid. Constraints include minimum and maximum segment counts, required segments, and segment dependencies.

Pattern matching process

When the step processes a record:

  1. The input value is compared against all valid patterns.

  2. If a match is found, the value is parsed into its segments.

  3. Each segment value is output as a separate attribute.

Configuration

The Pattern Parser step uses a three-step wizard to configure parsing patterns. Before you start the wizard, select the Attribute to parse.

To open the wizard, select Start setup.

This step uses AI to analyze your examples and automatically generate segments and patterns. AI assistance is strongly recommended for optimal configuration.

If AI features are turned off in your organization, you can still configure the step manually, but you will need to define all segments, detection methods, and constraints yourself without AI suggestions.

Pattern Parser configuration showing attribute selector and start setup button

Step 1: String examples

Provide example values and describe what they represent.

Examples (required)

Enter multiple example values from your attribute, one per line. More examples help the AI understand variations in your data format. For example, if your data sometimes includes a ZIP code and sometimes doesn’t, include examples of both scenarios to ensure all variations are recognized.

String description (optional)

Describe what the text represents and what parts you want to extract. Be specific about formats and structure. For example: "Product code followed by a 4-digit year."

The AI analyzes your examples and description to generate initial parsing segments.

Select Identify segments to continue.

Step 1 wizard showing Examples and String description fields

Step 2: Confirm segments

Review and adjust the AI-generated segments. Each segment represents a part of the text to extract.

Step 2 showing segment list and configuration panel

Segment management

The Identified segments section displays all detected segments as a reorderable list.

  • Drag to reorder: Use the drag handle icon to reorder segments.

  • Delete segment: Remove segments as needed.

  • Add segment: Select Add segment to manually add a new segment to the configuration.

Configuring each segment

Select a segment to expand its configuration panel. Each segment has the following required fields:

Segment name (required)

The internal identifier for the segment (for example, street_number, zip_code).

Output attribute name (required)

The name of the output attribute that will contain the extracted value for this segment.

Segment identification

Detection method (required)

Specifies the type of data the segment matches. Each detection method includes a helper description explaining its behavior.

Available segment types

Pattern Parser supports multiple segment types for matching different kinds of data:

Detection Method Description Example matches

Single letter

Single alphabetic character

A, x, M

Single word

Single word without spaces

Main, Street, Springfield

Multiple words

Multiple words with spaces

Main Street, New York City

Spaced-out word

Word with interlaced characters or separators

A-B-C, 1.2.3

Alphanumeric word

Alphanumeric word (letters and numbers)

ABC123, Order2024

Single digit

Single numeric digit

0, 5, 9

Number

Any numeric value with optional separators

123, 45.67, 1 234

Whole number (±)

Whole number that can include a plus or minus sign, with optional separators

+7, -12, 25, 1 234, 0

Decimal number

Decimal number

3.14, 99.99, 0.001

Roman numeral

Roman numeral

I, IV, XII, MCMXC

Any text (*)

Any sequence of characters

Variable content

Exact text match

Exact string match

ERROR, USD, APPROVED

Custom pattern (regexp)

Regular expression pattern

Custom patterns like \d{3}-\d{3}-\d{4}

Advanced options

For certain segment types (Number, Whole number), additional fields are available under the Advanced section:

Minimum digits

Minimum number of digits required in the segment.

Maximum digits

Maximum number of digits allowed in the segment.

Allow digit grouping with a separator

Permits using separators between digit groups (for example, 1 234 instead of 1234).

Reference data validation

You can validate extracted values against a reference data table.

Reference data for validation

The system validates whether each extracted value exists in the selected table. Select Select reference data table to choose a reference table for validation.

Extracted values preview

The Extracted values preview section shows example values that would be extracted for the selected segment based on your input examples.

Select Review patterns to continue.

This preview is generated by AI from your examples and might not be perfect.

Step 3: Review and refine patterns

Review and configure pattern constraints, then refine the generated patterns.

Step 3 showing pattern constraints and review table

Pattern constraints

Define rules that control which segment combinations are valid.

Pattern constraints define rules for how segments can appear in valid patterns. For example, if the segment ZIP code appears, it must be in the last position.

The wizard supports the following constraint types:

Position constraints
  • Always required: Segment must appear in every pattern.

  • Minimum number of segments: Set the minimum segment count for valid patterns.

  • Maximum number of segments: Set the maximum segment count for valid patterns.

Position in pattern

These constraints apply only when the segment appears in a pattern:

  • First position: Segment must be in the first position.

  • Last position: Segment must be in the last position.

  • Between first and last: Segment must appear between the first and last positions.

Dependency
  • Require if present: If one segment appears, other specified segments must also appear.

  • Make other segments optional: If one segment appears, other specified segments become optional.

Grouping
  • At least one appears: At least one of the specified segments must appear in the pattern.

  • Appear together: All specified segments must appear together or not at all.

Exclusivity
  • Cannot appear together: The specified segments cannot appear in the same pattern.

  • Only one can appear: Only one of the specified segments can appear in any pattern.

Select Add constraint to add constraint rules for segments.

Review patterns

The pattern review section displays counts for Valid patterns and Excluded patterns. Use Filter by segments to filter the pattern list.

After generation, patterns are organized into three categories:

Valid patterns

Patterns that should match input data. Records matching these patterns are parsed successfully.

Excluded patterns

Patterns you’ve manually excluded. Use this to remove patterns that are technically valid but shouldn’t match your data.

Invalid patterns

Patterns that don’t meet your constraints. These patterns are not used for matching but can be viewed using the three dots menu.

Use the three dots menu to:

  • Reorder manually: Change the order of patterns.

  • View invalid patterns: View patterns that failed constraints.

Invalid patterns

Select View invalid patterns to view patterns rejected due to constraint violations. The modal shows each invalid pattern, the reason it failed (such as "Number of segments is less than 3"), and an Include option to add it to valid patterns.

Pattern list

The pattern table shows the Order, Pattern segments, Preview values, and an Exclude option for each pattern. The top pattern is applied first during matching.

You can exclude individual patterns using Exclude next to each pattern. Excluded patterns will not be used for matching but remain visible in the Excluded patterns category.

Select Finish to complete the wizard.

Result: Parsing patterns

After completing the wizard, the Parsing patterns section displays all configured patterns in priority order. Each pattern shows its segment combination, and patterns are applied in order from top to bottom during data processing.

Parsing patterns section showing configured patterns in priority order

Use Edit pattern setup to reopen the wizard and modify your configuration, or Regenerate preview to test the patterns with sample data from your dataset.

Data preview showing parsed attribute values

The parsed values are available as output attributes for use in subsequent plan steps or the final output.

The step automatically creates an attribute to store the name of the pattern that successfully parsed each record. This attribute is named pattern_<input_attribute_name> (for example, if parsing an attribute called Product_codes, the pattern attribute will be named pattern_Product_codes).

The separator matches the input attribute name (underscore if the input uses underscores, space if the input uses spaces).

This attribute is created automatically in the background and is not directly visible in the step configuration.

Example 1: Parsing product codes

This example shows extracting structured information from product SKUs.

Step 1: String examples

Enter examples:

PRD-2024-Q1-00125-XL
SRV-2023-Q4-00891-NA
PRD-2024-Q2-00001-SM

String description:

Product SKU with product line, year, quarter, sequence number, and size code separated by hyphens
Step 1 showing product code examples and description

Step 2: Confirm segments

The AI generates segments:

  • Product line (detection method: Exact text match)

  • Year (detection method: Whole number (±))

  • Quarter (detection method: Custom pattern (regexp), pattern: Q[1-4])

  • Sequence number (detection method: Whole number (±))

  • Size (detection method: Exact text match)

Review and configure each segment according to your needs.

Step 2 showing segment configuration for product codes

Step 3: Review patterns

Configure constraints:

  • Set Minimum number of segments: 5

  • Set Maximum number of segments: 5

  • Mark Product line, Year, Quarter, Sequence number, and Size as must appear in every pattern

  • Set Product line to First position

  • Set Size to Last position

The pattern review displays all valid patterns.

Step 3 showing pattern constraints and review

Results

After completing the wizard, the Pattern Parser step parses product codes into individual segments:

Input product_line year quarter sequence size

PRD-2024-Q1-00125-XL

PRD

2024

Q1

00125

XL

SRV-2023-Q4-00891-NA

SRV

2023

Q4

00891

NA

Parsed product code data showing extracted segments

Example 2: Parsing US addresses

This example shows parsing a US address into its segments.

Step 1: String examples

Enter examples (one per line):

123 Main Street, Springfield, IL 62701
456 Oak Avenue, Chicago, IL 60601
789 Park Drive, Boston, MA 02101

String description:

US mailing address with street number and name, city, state abbreviation, and 5-digit zip code

Step 2: Confirm segments

The AI generates segments for each address part. Review and adjust:

  • street_number (detection method: Whole number (±))

  • street_name (detection method: Multiple words)

  • city (detection method: Multiple words)

  • state (detection method: Exact text match)

  • zip_code (detection method: Whole number (±))

Step 3: Review patterns

Configure constraints:

  • Mark street_number, street_name, city, and state as Always required

  • Set zip_code as Last position

  • Set minimum segments: 4, maximum segments: 5

Results

Input street_number street_name city state zip_code

123 Main Street, Springfield, IL 62701

123

Main Street

Springfield

IL

62701

456 Oak Avenue, Chicago, IL 60601

456

Oak Avenue

Chicago

IL

60601

Best practices

Provide multiple examples

Include several examples showing variations in your data format. More examples help the AI generate better segments.

Write clear descriptions

Explain what each part of the text represents. Be specific about formats (for example, "5-digit zip code" not just "zip code").

Choose specific detection methods

Use Whole number (±) for integers rather than Number. Use Exact text match for known fixed values rather than broad patterns like Any text (*).

Test with various inputs

After configuration, verify the parser works with different input variations.

Start simple

Begin with basic segments and constraints, then add complexity as needed.

Use pattern names for debugging

Enable the pattern name output attribute to understand which patterns are matching your data.

Troubleshooting

No patterns generated

  • Check that minimum segment count doesn’t exceed available segments.

  • Verify constraints aren’t too restrictive.

  • Ensure segments match your sample data.

Some records fail to parse

  • Review failed records to identify format differences.

  • Consider making some segments optional.

  • Use broader segment types if needed.

Wrong values extracted

  • Verify separator configuration for each segment.

  • Check that segment types match your data.

  • Confirm segment order matches input structure.

Limitations

  • Maximum of 6 segments per configuration.

  • Works best with consistent, predictable text formats.

  • Highly variable or irregular text may be difficult to parse reliably.

  • Complex nested structures may require multiple parsing steps.

  • All output attributes are string type and require type conversion in subsequent steps.

See also

Was this page useful?