AI Matching Configuration

The following properties configure the Matching Manager microservice and are provided either through the Configuration Service, in the Matching Manager deployment, or in the configuration file ai-matching-matching-manager/etc/application.properties.

Matching Manager

Property Data Type Default Value Description

Property	Data Type	Default Value	Description
`ataccama.client.connection.mdc.grpc.port`	`Number`	`18581`	The gRPC port of the server where MDC is running.
`ataccama.client.connection.mdc.host`	`String`	`localhost`	The IP address or the URL of the server where the MDC is running.
`ataccama.one.ai-matching.matching-manager.grpc.server.listen-address`	`String`	`0.0.0.0`	The network address to which the Matching Manager gRPC server should bind.
`ataccama.one.ai-matching.matching-manager.grpc.server.port`	`Number`	`8640`	The port where the gRPC interface of the Matching Manager microservice is running.
`ataccama.one.ai-matching.matching-manager.http.server.listen-address`	`String`	`0.0.0.0`	The network address to which the Matching Manager HTTP server should bind.
`ataccama.one.ai-matching.matching-manager.http.server.port`	`Number`	`0.5`	The dedupe clustering decision threshold that functions as a compromise between precision and recall. The value needs to be between 0 and 1. Increasing the value means a higher precision and lower recall, that is, fewer MERGE proposals and more SPLIT proposals. Inversely, decreasing the value results in a lower level of precision and higher recall.
`ataccama.one.ai-matching.matching_steps.evaluation.groups_fetching_batch_size`	`Number`	`100`	The number of groups or clusters that are processed in a single batch when proposals are generated during the AI Matching evaluation. A higher number means that the processing is more efficient but requires more memory (RAM).
`ataccama.one.ai-matching.matching_steps.evaluation.scoring_batch_size`	`Number`	`5000`	The number of proposals that are processed in a single batch when proposals are scored during the AI Matching evaluation. A higher number means that the processing is more efficient but requires more memory (RAM).
`ataccama.one.ai-matching.matching_steps.initialization.sample_size`	`Number`	`1000000`	The number of records that are uniformly sampled from all the records fetched from MDM. Those records are the only ones used for initializing and training the AI Matching model.
`ataccama.one.ai-matching.matching_steps.initialization.training_sample_size`	`Number`	`40000`	The number of records that the AI Matching selects out of the records covered by the property `ai-matching.matching_steps.initialization.sample_size` for the actual training of the AI model. A higher value means that the model performs better, but the training takes more time.
`ataccama.one.ai-matching.matching_steps.rules_extraction.max_columns`	`Number`	`5`	The maximum number of columns in one extracted rule. A higher number means that the extracted rules can be more complex, that is, use more columns, but the rule extraction might take significantly longer.
`ataccama.one.ai-matching.matching_steps.rules_extraction.max_negative_pairs`	`Number`	`10000`	The maximum number of confident positive pairs that are considered for rule extraction. A higher number means that rule extraction is significantly slower, but the results could be more precise.
`ataccama.one.ai-matching.matching_steps.training.desired_model_quality`	`Number`	`0.0`	The minimum desired model quality after the training phase finishes. The value needs to be between 0 and 1, which represents the correctness (quality) of the trained model based on the user provided pairs during the training process. The higher the value, the more stringent requirements are for continuing to next steps after the training phase. The model quality can be improved by checking already provided pairs or providing additional pairs.
`ataccama.one.ai-matching.matching_steps.training.max_cross_validation_folds`	`Number`	`30`	The maximum number of folds (splits) used in model quality evaluation after training. Must be set to a non negative integer. A higher value makes the evaluation more precise but also slower (roughly max_cross_validation_folds seconds). Values higher than the actual number of labeled training pairs do not have any effect. If set to 0 or 1, the cross-validation part of the evaluation is skipped (that is, the model is evaluated - both trained and tested - only on all labeled training pairs).

ataccama.client.connection.mdc.grpc.port

Number

18581

The gRPC port of the server where MDC is running.

ataccama.client.connection.mdc.host

String

localhost

The IP address or the URL of the server where the MDC is running.

ataccama.one.ai-matching.matching-manager.grpc.server.listen-address

String

0.0.0.0

The network address to which the Matching Manager gRPC server should bind.

ataccama.one.ai-matching.matching-manager.grpc.server.port

Number

8640

The port where the gRPC interface of the Matching Manager microservice is running.

ataccama.one.ai-matching.matching-manager.http.server.listen-address

String

0.0.0.0

The network address to which the Matching Manager HTTP server should bind.

ataccama.one.ai-matching.matching-manager.http.server.port

Number

0.5

The dedupe clustering decision threshold that functions as a compromise between precision and recall. The value needs to be between 0 and 1. Increasing the value means a higher precision and lower recall, that is, fewer MERGE proposals and more SPLIT proposals. Inversely, decreasing the value results in a lower level of precision and higher recall.

ataccama.one.ai-matching.matching_steps.evaluation.groups_fetching_batch_size

Number

100

The number of groups or clusters that are processed in a single batch when proposals are generated during the AI Matching evaluation. A higher number means that the processing is more efficient but requires more memory (RAM).

ataccama.one.ai-matching.matching_steps.evaluation.scoring_batch_size

Number

5000

The number of proposals that are processed in a single batch when proposals are scored during the AI Matching evaluation. A higher number means that the processing is more efficient but requires more memory (RAM).

ataccama.one.ai-matching.matching_steps.initialization.sample_size

Number

1000000

The number of records that are uniformly sampled from all the records fetched from MDM. Those records are the only ones used for initializing and training the AI Matching model.

ataccama.one.ai-matching.matching_steps.initialization.training_sample_size

Number

40000

The number of records that the AI Matching selects out of the records covered by the property ai-matching.matching_steps.initialization.sample_size for the actual training of the AI model. A higher value means that the model performs better, but the training takes more time.

ataccama.one.ai-matching.matching_steps.rules_extraction.max_columns

Number

5

The maximum number of columns in one extracted rule. A higher number means that the extracted rules can be more complex, that is, use more columns, but the rule extraction might take significantly longer.

ataccama.one.ai-matching.matching_steps.rules_extraction.max_negative_pairs

Number

10000

The maximum number of confident positive pairs that are considered for rule extraction. A higher number means that rule extraction is significantly slower, but the results could be more precise.

ataccama.one.ai-matching.matching_steps.training.desired_model_quality

Number

0.0

The minimum desired model quality after the training phase finishes. The value needs to be between 0 and 1, which represents the correctness (quality) of the trained model based on the user provided pairs during the training process. The higher the value, the more stringent requirements are for continuing to next steps after the training phase. The model quality can be improved by checking already provided pairs or providing additional pairs.

ataccama.one.ai-matching.matching_steps.training.max_cross_validation_folds

Number

30

The maximum number of folds (splits) used in model quality evaluation after training. Must be set to a non negative integer.

A higher value makes the evaluation more precise but also slower (roughly max_cross_validation_folds seconds). Values higher than the actual number of labeled training pairs do not have any effect. If set to 0 or 1, the cross-validation part of the evaluation is skipped (that is, the model is evaluated - both trained and tested - only on all labeled training pairs).

Was this page useful?