User Community Service Desk Downloads
If you can't find the product or version you're looking for, visit support.ataccama.com/downloads

Monitoring Metrics

The following article provides a list of available module-specific monitoring metrics for the Ataccama ONE Platform.

DPM and DPE

DPM

For Data Processing Module (DPM), the endpoint publishes the metrics as follows:

Metric Metric type Description

ataccama_one_dpm_plugin_executor_jobs_count

Gauge

The number of jobs in each job status. Possible values of the status tag are as follows: unknown, submitted, preprocessing, queued, submitted_to_engine, starting, running, postprocessing, success, failure, killing, killed, disconnected, unrecognized.

To get a full list of all job statuses along with the respective number of jobs, you can use the following query:

ataccama_one_dpm_plugin_executor_jobs_count{status=~"$job_status"}

In that case, the variable job_status is determined as follows:

$job_status = label_values(ataccama_one_dpm_plugin_executor_jobs_count, status)

Alternatively, to retrieve the number of jobs for a specific status, use the following structure:

ataccama_one_dpm_plugin_executor_jobs_count{status="queued"}

ataccama_one_dpm_plugin_executor_queue_jobs_top_priority_job_age_millis

Gauge

Measures the age of the job with the highest priority level in the DPM job queue. This is measured from the time of creation up to the time of retrieval.

ataccama_one_dpm_engines_count_active

Gauge

The number of currently active DPEs.

ataccama_one_dpm_engines_count_connected

Gauge

The number of connected DPEs.

ataccama_one_dpm_engines_count_disconnected

Gauge

The number of disconnected DPEs.

ataccama_one_dpm_engines_count_inactive

Gauge

The number of currently inactive DPEs.

ataccama_one_dpm_events_engine_pick_seconds_max

Gauge

The maximum duration of retrieving DPM events, in seconds.

ataccama_one_dpm_events_engine_pick_seconds_sum

Gauge

The total time spent while retrieving DPM events, in seconds.

ataccama_one_dpm_events_engine_pick_seconds_count

Gauge

The number of attempts at retrieving DPM events.

ataccama_one_dpm_events_engine_processingJobs

Gauge

The number of jobs for which events are currently in memory waiting to be processed.

ataccama_one_dpm_engines_status_check_duration_seconds

Gauge

Measures how long it takes to check the status of DPEs, expressed in seconds.

ataccama_one_dpm_plugin_executor_job_to_kill_queue_size

Gauge

The number of jobs waiting to be canceled.

ataccama_one_dpm_plugin_executor_job_create_duration

Timer

The duration of creating jobs.

ataccama_one_dpm_plugin_executor_job_update_status_duration

Timer

The duration of updating jobs.

ataccama_one_dpm_plugin_executor_job_to_submit_add_duration

Timer

The duration of add operations on the submit job queue.

ataccama_one_dpm_plugin_executor_job_to_kill_add_duration

Timer

The duration of add operations on the cancel job queue.

ataccama_one_dpm_plugin_executor_job_killAsAdmin_duration

Timer

The duration of canceling jobs as admin.

ataccama_one_dpm_plugin_executor_job_submit_duration

Timer

The duration of submitting jobs.

ataccama_one_dpm_plugin_executor_job_kill_duration

Timer

The duration of canceling jobs.

ataccama_one_dpm_plugin_executor_job_resubmit_duration

Timer

The duration of resubmitting jobs.

ataccama_one_dpm_plugin_executor_job_submit_message_size

Gauge

The size of incoming JobSubmit requests.

ataccama_one_dpm_plugin_executor_file_upload_duration

Timer

The duration of uploading files.

ataccama_one_dpm_plugin_executor_file_provider_download_duration

Timer

The duration of downloading provider files.

ataccama_one_dpm_plugin_executor_job_preprocess_duration

Timer

The duration of job preprocessing.

Deprecated. Use ataccama.one.dpm.plugin.executor.job.preprocess.plugin instead.

ataccama_one_dpm_plugin_executor_job_status_change_duration

Timer

The duration of job status changes. The status transition is determined using tags status.old and status.new.

ataccama_one_dpm_plugin_executor_job_preprocess_plugin

Timer

The duration of job preprocessing.

The metric uses name tags that are equal to job TYPES. The related histogram and 0.5 and 0.95 percentiles are also published.

ataccama_one_dpm_plugin_executor_job_postprocess_plugin

Timer

The duration of job postprocessing.

The metric uses name tags that are equal to job TYPES. The related histogram and 0.5 and 0.95 percentiles are also published.

ataccama_one_dpm_plugin_executor_thread_pool

Threadpool

The name of the thread pool in the tag thread.pool.name.

ataccama_one_dependency_health

Gauge

Health check metrics grouped by dependency and depender (in this case, DPM).

Must be explicitly enabled through the property ataccama.one.dpm.health-checks.expose-as-metrics=true. Other health checks are explicitly enabled using management.endpoint.health.mmm-be.enabled=true.

ataccama_one_dpm_engines_status_check_duration_max

Gauge

The maximum duration of a DPE status check, expressed in seconds.

ataccama_one_dpm_engines_status_check_duration_min

Gauge

The minimum duration of a DPE status check, expressed in seconds.

ataccama_one_dpm_engines_status_check_duration_average

Gauge

The average duration of a DPE status check, expressed in seconds.

ataccama_one_dpm_engines_status_check_duration_median

Gauge

The median duration of a DPE status check, expressed in seconds.

DPE

For Data Processing Engine (DPE), the following metrics are available:

Metric Metric type Description

ataccama_one_dpe_plugin_snowflake_total_processing_time

Timer

Average total processing time of pushdown processing over the previous five runs.

ataccama_one_dpe_plugin_snowflake_deployment_time

Timer

Average time spent on deploying Java functions to Snowflake over the previous five runs.

ataccama_one_dpe_plugin_snowflake_statistics_processing_time

Timer

Average time spent on processing column statistics requirement and translating them to Snowpark Dataframe over the previous five runs.

ataccama_one_dpe_plugin_snowflake_business_rules_processing_time

Timer

Average time spent on processing domain detection rules and translating them to Snowpark Dataframe over the previous five runs.

ataccama_one_dpe_plugin_snowflake_lookups_upload_time

Timer

Average time spent on uploading lookups to Snowflake temporary tables over the previous five runs.

ataccama_one_dpe_plugin_snowflake_group_by_queries_time

Timer

Average time spent on processing and executing a query that gathers frequency statistics over the previous five runs.

ataccama_one_dpe_plugin_snowflake_aggregated_statistics_time

Timer

Average time spent on executing Snowpark Dataframes over the previous five runs.

ataccama_one_dpe_plugin_snowflake_report_build_time

Timer

Average time spent on processing returned results over the previous five runs.

ataccama_one_dpe_plugin_snowflake_incoming_requests_count

Gauge

Displays how many Snowflake Pushdown jobs were processed during the specified time window. Configurable via property plugin.snowflake.metrics.requests-statistics-window with the default value of 60s.

ataccama_one_dpe_plugin_snowflake_fingerprint_statistics_time

Timer

Average time spent on processing fingerprint requirements and translating them to Snowpark Dataframes over the previous five runs.

ataccama_one_dpe_plugin_executor_jobs_running

Gauge

The number of jobs that are currently running in DPE.

ataccama_one_dpe_constraints_collector_duration

Timer

Measures how long it takes to collect and compute constraints. Static constraints are calculated only once during the first collection.

ataccama_one_dpe_constraints_cached_data_source_session_digests_duration_seconds

Timer

Measures how long it takes to compute constraints based on digests of the cached data source connections.

ataccama_one_dependency_health

Gauge

Health check metrics grouped by dependency and depender (in this case, DPE).

Must be explicitly enabled through the property ataccama.one.dpe.health-checks.expose-as-metrics=true. Other health checks are explicitly enabled using management.endpoint.health.dpm.enabled=true.

Health statuses are mapped as follows:

  • UP = 0

  • DOWN = 1

  • OUT_OF_SERVICE = 2

  • UNKNOWN = 3

  • Other = 4

MMM

For Metadata Management Module (MMM), the following metrics are available:

Metric Metric type Description

ataccama_one_mmm_mdtransactions_time_seconds_count

Gauge

The number of metadata transactions.

ataccama_one_mmm_mdtransactions_time_seconds_sum

Gauge

The total time spent on transactions.

ataccama_one_mmm_mdtransactions_time_seconds_max

Gauge

The maximum duration of a transaction.

ataccama_one_mmm_mdtransactions_active

Gauge

The number of currently active transactions.

ataccama_one_mmm_mdtransactions_listener_time_seconds_count

Gauge

The total number of times the listener was invoked.

ataccama_one_mmm_mdtransactions_listener_time_seconds_sum

Gauge

The total time spent in the listener.

ataccama_one_mmm_mdtransactions_listener_time_seconds_max

Gauge

The maximum time spent in the listener.

ataccama_one_mmm_domainevents_handler_time_seconds_sum

Gauge

The total number of events fired.

ataccama_one_mmm_domainevents_handler_time_seconds_count

Gauge

The number of events fired.

ataccama_one_mmm_domainevents_handler_time_seconds_max

Gauge

The maximum duration of the handler.

ataccama_one_mmm_dpm_async_pool_active

Gauge

The number of actively running threads in the pool for processing DPx results.

ataccama_one_mmm_dpm_async_pool_queue_size

Gauge

The number of tasks waiting to be processed in the DPx queue.

ataccama_one_mmm_dpm_async_pool_size

Gauge

The current size of the pool for processing DPx results.

ataccama_one_mmm_dpm_events_total

Counter

The total number of events received from DPM and DPE, grouped by type and status.

ataccama_one_mmm_dpm_result_size_bytes

Histogram

The size of DPM results per job type.

ataccama_one_mmm_dpm_result_handler_time_seconds

Histogram

The number of seconds that MMM requires to execute the job success result handler, provided by job type.

ataccama_one_mmm_jobs_duration_total_seconds

Timer

The total duration of jobs grouped by exit.

ataccama_one_mmm_jobs_duration_partial_seconds

Timer

The number of seconds jobs spent in each status, grouped by exitStatus and type.

ataccama_one_mmm_mdoperations_time_seconds

Timer

The total duration of metadata operations, grouped by operation.

ataccama_one_mmm_flow_handler_seconds

Timer

The total time spent in the flow event handler.

ataccama_one_mmm_profiling_duration_seconds

Timer

The duration of profiling, grouped by phase.

ataccama_one_mmm_profiling_count_total

Counter

The number of catalog item attributes that were profiled.

ataccama_one_mmm_metadataimport_entity_total

Counter

The total number of entities affected during a metadata import grouped by type, for example, location, catalogItem, attribute, other, relation, relationsToUpdate, updatedRelation.

ataccama_one_mmm_externalevents_events_pending

Gauge

The number of stored external events (not delivered or not acknowledged).

ataccama_one_mmm_externalevents_events_total

Counter

The total number of all external events.

ataccama_one_mmm_externalevents_subscribers_online

Gauge

The number of external event subscribers that are currently online.

ataccama_one_mmm_externalevents_subscribers_total

Gauge

The total number of external event subscribers.

ataccama_one_grpc_client_call

Timer

The duration of outgoing gRPC calls, grouped by serviceName, method, status, and type.

ataccama_one_grpc_client_stream_received_total

Counter

The number of received gRPC streaming messages on the client side, grouped by serviceName and method.

ataccama_one_grpc_client_stream_sent_total

Counter

The number of sent gRPC streaming messages on the client side, grouped by serviceName and method.

ataccama_one_grpc_server_request

Timer

The duration of incoming gRPC calls, grouped by serviceName, method, status, and type.

ataccama_one_grpc_server_stream_received

Counter

The number of received gRPC streaming messages on the server side, grouped by serviceName and method.

ataccama_one_grpc_server_stream_sent

Counter

The number of sent gRPC streaming messages on the server side, grouped by serviceName and method.

ataccama_one_grpc_server_pool_queue_size

Gauge

The number of tasks waiting to be processed in the gRPC server executor queue.

ataccama_one_grpc_server_pool_active

Gauge

The number of active threads in the gRPC server executor.

ataccama_one_grpc_server_pool_size

Gauge

The current number of threads in the gRPC server executor.

ataccama_one_mmm_catalogsearch_es_query_duration

Timer

The duration of catalog item search queries.

ataccama_one_mmm_catalogsearch_events_compacted_total

Counter

The number of all compacted events in the catalog search outbox table (used for optimizing performance).

ataccama_one_mmm_catalogsearch_events_failed

Gauge

The current number of failed events that will not be reprocessed.

ataccama_one_mmm_catalogsearch_events_partly_failed

Gauge

The current number of failed catalog search sync events that will be reprocessed.

ataccama_one_mmm_catalogsearch_events_pending

Gauge

The current number of catalog search sync events waiting to be reprocessed.

ataccama_one_mmm_catalogsearch_events_total

Counter

The total number of events added to the catalog search outbox queue.

ataccama_one_mmm_threadpool_pool_active

Gauge

The number of actively running threads in a pool, grouped by the thread pool name.

ataccama_one_mmm_threadpool_pool_queue_size

Gauge

The number of tasks waiting to be processed in a pool, grouped by the thread pool name.

ataccama_one_mmm_threadpool_pool_size

Gauge

The current pool size, grouped by the thread pool name.

ataccama_one_mmm_activeusers

Gauge

The number of users that are simultaneously working with the application (application node).

graphql_timer_query_seconds

Timer

The duration of GraphQL operations. Operations can be distinguished by the tag operationName.

ataccama_one_dependency_health

Gauge

Health check metrics grouped by dependency and depender (in this case, mmm). Must be explicitly enabled through the property ataccama.one.mmm.health-checks.expose-as-metrics=true.

Other health checks are explicitly enabled using the following properties:

  • management.endpoint.health.one-object-storage.enabled=true

  • management.endpoint.health.dpm.enabled=true

  • management.endpoint.health.ai-core-anomaly.enabled=true

Health statuses are mapped as follows:

  • UP = 0

  • DOWN = 1

  • OUT_OF_SERVICE = 2

  • UNKNOWN = 3

  • Other = 4

ataccama_one_mmm_scheduler_size

Gauge

The current number of threads in the scheduler pool, grouped by the scheduler name.

ataccama_one_mmm_scheduler_active

Gauge

The number of active threads in the scheduler pool, grouped by the scheduler name.

ataccama_one_mmm_scheduler_queue_size

Gauge

The number of tasks waiting to be processed in the scheduler queue, grouped by the scheduler name.

ataccama_one_mmm_scheduler_task_scheduled

Gauge

An approximate total number of tasks that have been scheduled for execution at any point, grouped by the scheduler name.

ataccama_one_mmm_scheduler_task_completed

Gauge

An approximate total number of tasks that have been completed, grouped by the scheduler name.

ataccama_one_mmm_dq_eval_volume_total

Counter

The numbers of entities involved in DQ evaluation, grouped by parameter (rules, dqChecks, catalogItems, attributes) and type (DQ_EVAL_TERM, DQ_EVAL_PROJECT, DQ_EVAL_CATALOG_ITEM, DQ_EVAL_ATTRIBUTE).

ataccama_one_mmm_lookups_uploaded_files_total

Counter

The number of lookup files (.lkp) that were successfully uploaded using lookup management.

Catalog Search plugin

The following suffixes related to the Catalog Search plugin in MMM are particularly important as they can indicate a critical error in the functioning of the plugin or its communication with Elasticsearch. Therefore, we recommend proactively monitoring these metrics with alerts configured for the listed thresholds.
Metric Threshold Recommended action

*_catalogsearch.events.pending

5000

The suggested threshold corresponds to the expected peak number of pending events during a metadata import. This indicates that the application is lagging in processing of existing events.

In case the increase in the queue was caused by a bulk import, the issue typically resolves on its own. If the issue is reoccurring, the processing capabilities of the application can be scaled up as needed.

*_catalogsearch.events.partly_failed

101

The suggested threshold corresponds to the size of the event batch that should be processed. Given that the metric indicates that all events finish with an error, the MMM log should be checked for any technical or connectivity problems between Elasticsearch and the Catalog Search plugin.

In case events finish in the failed state instead (because they could not be reprocessed), the application administrator should run the recovery process.

*_catalogsearch.events.failed

1

Check the MMM log for any technical or connectivity problems between Elasticsearch and the Catalog Search plugin. In most cases, the application administrator should run the recovery process.

Transaction Data plugin

Metric Labels Description

data.history.transaction.data.fetch

Time taken (seconds) to fetch transaction data and map them to GraphQL response in data history plugin.

anomaly.detection.job.execute

['type','transaction.data']

Time taken to execute anomaly detection on transaction data.

anomaly.detection.transaction.data.update

['type','transaction.data']

Time taken to persist anomaly information of transaction data.

anomaly.detection.history.fetch

['type','transaction.data']

Time taken to fetch transaction data related to anomaly detection job request.

transaction.data.execute

Time taken to create and submit transaction data executor job.

transaction.data.import.results

Time taken to import results from results of transaction data executor job.

transaction.data.context.serialize

Time taken to serialize transaction data executor job context.

transaction.data.db.select

['action','[value]']

Possible values:

  • select.aggregation.results: Select transaction data results without anomalies.

  • select.computed.at.results: Select time of computation for a specific configuration.

  • select.aggregation.results.with.anomalies: Select transaction data results with anomaly information.

  • select.latest.config.hcn: Select HCN of configuration when executed.

  • update.anomalies: Update anomaly information related to transaction data.

  • insert.results: Insert results of transaction data execution.

  • delete.existing.results: Delete results of transaction data execution related to specific configuration.

Anomaly Detection

Component Metric Metric type Description Labels

Anomaly Detector

ataccama_one_anomaly_detection_anomaly_detector__n_ad_stream_requests_total

Counter

The number of stream requests for anomaly detection issued from MMM.

[]

Anomaly Detector

ataccama_one_anomaly_detection_anomaly_detector__n_ad_unary_requests_total

Counter

The number of unary requests for anomaly detection issued from MMM.

[]

Anomaly Detector

ataccama_one_anomaly_detection_anomaly_detector__duration_of_ad_request_seconds

Histogram

A histogram representing the duration of anomaly detection requests over all provided categories.

[]

Anomaly Detector

ataccama_one_anomaly_detection_anomaly_detector__duration_of_ad_per_model_on_one_category_seconds

Summary

The number of seconds needed to complete anomaly detection processing for the chosen model and the given category.

['model_type', 'category_type', 'detection_on_full_history']

Anomaly Detector

ataccama_one_anomaly_detection_anomaly_detector__n_data_points_total

Summary

The number of data points (for example, profiling versions) that were fetched from MMM.

[]

Anomaly Detector

ataccama_one_anomaly_detection_anomaly_detector__n_positive_anomaly_feedbacks_total

Summary

The number of confirmed anomalies (feedback) that users provided.

[]

Anomaly Detector

ataccama_one_anomaly_detection_anomaly_detector__n_detected_anomalous_data_points_total

Summary

The number of data points that the model identified as anomalous.

['model_type', 'detection_on_full_history']

Anomaly Detector

ataccama_one_anomaly_detection_anomaly_detector__n_features_total

Summary

The number of features (metrics) sent for the anomaly detection.

[]

Anomaly Detector

ataccama_one_anomaly_detection_anomaly_detector__duration_of_isolation_forest_fit_method

Summary

The number of seconds needed to fit the Isolation Forest model.

[]

Anomaly Detector

ataccama_one_anomaly_detection_anomaly_detector__duration_of_isolation_forest_explainability_method

Summary

The number of seconds needed to run the Isolation Forest to obtain the explainability of anomalies.

[]

Anomaly Detector

ataccama_one_anomaly_detection_anomaly_detector__duration_of_isolation_forest_frequencies_cut_off

Summary

The number of seconds needed to run the frequency cutoff method in the Isolation forest model.

[]

gRPC Server

ataccama_one_anomaly_detection_anomaly_detector_grpc_server_auth_failures_total

Counter

The total number of gRPC requests with authentication failures.

[]

gRPC Server

ataccama_one_anomaly_detection_anomaly_detector_grpc_server_commands_total

Counter

The total number of gRPC commands received.

['type']

gRPC Server

ataccama_one_anomaly_detection_anomaly_detector_grpc_server_processing_seconds

Summary

The processing time of a gRPC request, expressed in seconds.

['stage']

gRPC Server

ataccama_one_anomaly_detection_anomaly_detector_grpc_server_queue_size

Gauge

The number of active RPCs, either queued or currently processed.

[]

Microservice

ataccama_one_anomaly_detection_anomaly_detector_microservice_microservice

Info

The microservice details.

[]

WSGI Server

ataccama_one_anomaly_detection_anomaly_detector_wsgi_server_auth_failures_total

Counter

The total number of HTTP requests with authentication failures.

[]

WSGI Server

ataccama_one_anomaly_detection_anomaly_detector_wsgi_server_requests_total

Counter

The total number of HTTP request status codes.

['status']

Term Suggestions

Feedback

Component Metric Metric type Description Labels

Feedback

ataccama_one_term_suggestions_feedback__feedbacks_total

Counter

The total number of positive or negative feedbacks received from users.

['type']

Feedback

ataccama_one_term_suggestions_feedback__thresholds

Histogram

The current distance thresholds.

[]

Database

ataccama_one_term_suggestions_feedback_database_query_seconds

Summary

The number of seconds a database query takes to complete.

['operation']

gRPC Server

ataccama_one_term_suggestions_feedback_grpc_server_auth_failures_total

Counter

The total number of gRPC requests with authentication failures.

[]

gRPC Server

ataccama_one_term_suggestions_feedback_grpc_server_commands_total

Counter

The total number of gRPC commands received.

['type']

gRPC Server

ataccama_one_term_suggestions_feedback_grpc_server_processing_seconds

Summary

The processing time of a gRPC request, expressed in seconds.

['stage']

gRPC Server

ataccama_one_term_suggestions_feedback_grpc_server_queue_size

Gauge

The number of active RPCs, either queued or currently processed.

[]

Microservice

ataccama_one_term_suggestions_feedback_microservice_microservice

Info

The microservice details.

[]

WSGI Server

ataccama_one_term_suggestions_feedback_wsgi_server_auth_failures_total

Counter

The total number of HTTP requests with authentication failures.

[]

WSGI Server

ataccama_one_term_suggestions_feedback_wsgi_server_requests_total

Counter

The total number of HTTP request status codes.

['status']

Neighbors

Component Metric Metric type Description Labels

Neighbors

ataccama_one_term_suggestions_neighbors__database_attributes_present

Gauge

The number of attributes available to the Term Suggestions microservices.

The value might be overestimated.

[]

Neighbors

ataccama_one_term_suggestions_neighbors__index_attributes_present

Gauge

The number of attributes currently stored in the memory.

[]

Neighbors

ataccama_one_term_suggestions_neighbors__index_attributes_limit

Gauge

The maximum number of attributes that can be stored in the memory.

[]

Neighbors

ataccama_one_term_suggestions_neighbors__neighbors_distances

Histogram

Distances to k-th nearest neighbors.

['k']

Database

ataccama_one_term_suggestions_neighbors_database_query_seconds

Summary

The number of seconds a database query takes to complete.

['operation']

gRPC Server

ataccama_one_term_suggestions_neighbors_grpc_server_auth_failures_total

Counter

The total number of gRPC requests with authentication failures.

[]

gRPC Server

ataccama_one_term_suggestions_neighbors_grpc_server_commands_total

Counter

The total number of gRPC commands received.

['type']

gRPC Server

ataccama_one_term_suggestions_neighbors_grpc_server_processing_seconds

Summary

The processing time of a gRPC request, expressed in seconds.

['stage']

gRPC Server

ataccama_one_term_suggestions_neighbors_grpc_server_queue_size

Gauge

The number of active RPCs, either queued or currently processed.

[]

Microservice

ataccama_one_term_suggestions_neighbors_microservice_microservice

Info

The microservice details.

[]

WSGI Server

ataccama_one_term_suggestions_neighbors_wsgi_server_auth_failures_total

Counter

The total number of HTTP requests with authentication failures.

[]

WSGI Server

ataccama_one_term_suggestions_neighbors_wsgi_server_requests_total

Counter

The total number of HTTP request status codes.

['status']

Recommender

Component Metric Metric type Description Labels

Recommender

ataccama_one_term_suggestions_recommender__attributes_processed_total

Counter

The number of attributes for which suggestions were computed.

[]

Recommender

ataccama_one_term_suggestions_recommender__suggestions_created_total

Counter

The number of suggestions created.

[]

Recommender

ataccama_one_term_suggestions_recommender__terms_known

Gauge

The number of known terms.

[]

Recommender

ataccama_one_term_suggestions_recommender__terms_disabled

Gauge

The number of disabled terms.

[]

Recommender

ataccama_one_term_suggestions_recommender__recommendation_starts_total

Counter

The number of times all suggestions were rendered outdated.

[]

Recommender

ataccama_one_term_suggestions_recommender__recommendation_finishes_total

Counter

The number of times all suggestions were brought up to date.

[]

Recommender

ataccama_one_term_suggestions_recommender__recommendation_progress

Gauge

The number of attributes that have up-to-date suggestions.

[]

Recommender

ataccama_one_term_suggestions_recommender__recommendation_progress_with_ground_truth

Gauge

The number of attributes that have up-to-date suggestions and for which the ground truth is known.

[]

Recommender

ataccama_one_term_suggestions_recommender__suggestions_confusion_matrix

Gauge

The confusion matrix computed between suggestions and assigned terms.

['entry']

Database

ataccama_one_term_suggestions_recommender_database_query_seconds

Summary

The number of seconds a database query takes to complete.

['operation']

gRPC Client

ataccama_one_term_suggestions_recommender_grpc_client_query_seconds

Summary

The number of seconds a gRPC query takes to complete.

[]

gRPC Server

ataccama_one_term_suggestions_neighbors_grpc_server_auth_failures_total

Counter

The total number of gRPC requests with authentication failures.

[]

gRPC Server

ataccama_one_term_suggestions_neighbors_grpc_server_commands_total

Counter

The total number of gRPC commands received.

['type']

gRPC Server

ataccama_one_term_suggestions_neighbors_grpc_server_processing_seconds

Summary

The processing time of a gRPC request, expressed in seconds.

['stage']

gRPC Server

ataccama_one_term_suggestions_neighbors_grpc_server_queue_size

Gauge

The number of active RPCs, either queued or currently processed.

[]

Microservice

ataccama_one_term_suggestions_recommender_microservice_microservice

Info

The microservice details.

[]

WSGI Server

ataccama_one_term_suggestions_recommender_wsgi_server_auth_failures_total

Counter

The total number of HTTP requests with authentication failures.

[]

WSGI Server

ataccama_one_term_suggestions_recommender_wsgi_server_requests_total

Counter

The total number of HTTP request status codes.

['status']

AI Matching

Matching Manager

Component Metric Type Description Labels

Background thread

ataccama_one_ai_matching_background_thread_processing_failures_total

Counter

The total number of failures during the infinite processing loop.

['thread_name', 'error_class']

Background thread

ataccama_one_ai_matching_background_thread_processing_seconds

Summary

The processing time of a single work unit computed in the infinite processing loop, expressed in seconds.

['thread_name']

Background thread

ataccama_one_ai_matching_background_thread_run_failures_total

Counter

The total number of failures raised during the whole background thread run method.

['thread_name', 'error_class']

Background thread

ataccama_one_ai_matching_background_thread_run_seconds

Summary

The processing time of a whole background thread run method, expressed in seconds.

['thread_name']

Database

ataccama_one_ai_matching_database_query_seconds

Summary

The number of seconds a database query takes to complete.

['operation']

gRPC Client

ataccama_one_ai_matching_grpc_client_query_seconds

Summary

The number of seconds a gRPC query takes to complete.

[]

gRPC Server

ataccama_one_ai_matching_grpc_server_commands_total

Counter

The total number of gRPC commands received.

['type']

gRPC Server

ataccama_one_ai_matching_grpc_server_failures_total

Counter

The total number of gRPC requests with failures.

['error_class']

gRPC Server

ataccama_one_ai_matching_grpc_server_processing_seconds

Summary

The processing time of a gRPC request, expressed in seconds.

['stage']

gRPC Server

ataccama_one_ai_matching_grpc_server_queue_size

Gauge

The number of active RPCs, either queued or currently processed.

[]

gRPC Server

ataccama_one_ai_matching_grpc_server_auth_failures_total

Counter

Deprecated, use failures_total instead. The total number of gRPC requests with authentication failures.

[]

HTTP Server

ataccama_one_ai_matching_http_server_failures_total

Counter

The total number of HTTP requests with failures.

['error_class']

HTTP Server

ataccama_one_ai_matching_http_server_requests_total

Counter

The total number of HTTP request status codes.

['status']

HTTP Server

ataccama_one_ai_matching_http_server_processing_seconds

Summary

The processing time of a HTTP request, expressed in seconds.

['path']

Job

ataccama_one_ai_matching__job_time_seconds

Histogram

The total execution time of a job, expressed in seconds.

['matching_id', 'job_type']

Job

ataccama_one_ai_matching__jobs_total

Counter

The number of executed jobs.

['matching_id', 'job_type']

Microservice

ataccama_one_ai_matching_microservice_microservice

Info

The microservice details.

[]

Model

ataccama_one_ai_matching__labeled_pairs_total

Gauge

The total number of labeled pairs.

['matching_id', 'pair_label']

Model

ataccama_one_ai_matching__model_quality

Gauge

The model quality represented as a floating point value between 0 and 1.

['matching_id']

Matching Worker

Component Metric Type Description Labels

Background thread

ataccama_one_ai_matching_background_thread_processing_failures_total

Counter

The total number of failures during the infinite processing loop.

['thread_name', 'error_class']

Background thread

ataccama_one_ai_matching_background_thread_processing_seconds

Summary

The processing time of a single work unit computed in the infinite processing loop, expressed in seconds.

['thread_name']

Background thread

ataccama_one_ai_matching_background_thread_run_failures_total

Counter

The total number of failures raised during the whole background thread run method.

['thread_name', 'error_class']

Background thread

ataccama_one_ai_matching_background_thread_run_seconds

Summary

The processing time of a whole background thread run method, expressed in seconds.

['thread_name']

Database

ataccama_one_ai_matching_database_query_seconds

Summary

The number of seconds a database query takes to complete.

['operation']

gRPC Client

ataccama_one_ai_matching_grpc_client_query_seconds

Summary

The number of seconds a gRPC query takes to complete.

[]

gRPC Server

ataccama_one_ai_matching_grpc_server_commands_total

Counter

The total number of gRPC commands received.

['type']

gRPC Server

ataccama_one_ai_matching_grpc_server_failures_total

Counter

The total number of gRPC requests with failures.

['error_class']

gRPC Server

ataccama_one_ai_matching_grpc_server_processing_seconds

Summary

The processing time of a gRPC request, expressed in seconds.

['stage']

gRPC Server

ataccama_one_ai_matching_grpc_server_queue_size

Gauge

The number of active RPCs, either queued or currently processed.

[]

gRPC Server

ataccama_one_ai_matching_grpc_server_auth_failures_total

Counter

Deprecated, use failures_total instead. The total number of gRPC requests with authentication failures.

[]

HTTP Server

ataccama_one_ai_matching_http_server_failures_total

Counter

The total number of HTTP requests with failures.

['error_class']

HTTP Server

ataccama_one_ai_matching_http_server_requests_total

Counter

The total number of HTTP request status codes.

['status']

HTTP Server

ataccama_one_ai_matching_http_server_processing_seconds

Summary

The processing time of a HTTP request, expressed in seconds.

['path']

Job

ataccama_one_ai_matching__job_time_seconds

Histogram

The total execution time of a job, expressed in seconds.

['matching_id', 'job_type']

Job

ataccama_one_ai_matching__jobs_total

Counter

The number of executed jobs.

['matching_id', 'job_type']

Microservice

ataccama_one_ai_matching_microservice_microservice

Info

The microservice details.

[]

JVM

Metric Metric type Description

jvm_gc_pause_seconds_sum

jvm_gc_pause_seconds_count

Summary

The duration of garbage collector pauses.

jvm_memory_max_bytes

Gauge

The maximum amount of memory that is available for memory management, expressed in bytes.

hikaricp_connections_idle

Gauge

The number of idle connections in the thread pool.

PostgreSQL

Metric Metric type Description

pg_stat_database_deadlocks

Counter

The number of deadlocks detected in the database.

pg_stat_activity_max_tx_duration

Gauge

The maximum duration of an active transaction.

pg_settings_max_connections

Gauge

Used to set the maximum number of concurrent connections.

pg_stat_activity_count

Gauge

The number of active connections.

MDM

Metric Metric type Description

ataccama_one_grpc_server_request_seconds_max

Gauge

The duration of incoming gRPC calls, grouped by serviceName, method, status, and type.

ataccama_one_grpc_server_request_seconds

Summary

The duration of incoming gRPC calls, grouped by serviceName, method, status, and type.

ataccama_one_grpc_client_pool_active

Gauge

Number of active clients in the gRPC pool.

ataccama_one_grpc_client_call_seconds_max

Gauge

The duration of outgoing gRPC calls, grouped by serviceName, method, status, and type.

ataccama_one_grpc_client_call_seconds_count

Counter

The number of outgoing gRPC calls, grouped by serviceName, method, status, and type.

ataccama_one_grpc_client_call_seconds_sum

Gauge

The duration of outgoing gRPC calls, grouped by serviceName, method, status, and type.

ataccama_one_grpc_server_stream_received_total

Counter

The number of received gRPC streaming messages on the server side, grouped by serviceName and method.

ataccama_one_grpc_server_exceptions_count_total

Counter

Total number of exceptions on gRPC request processing.

ataccama_one_grpc_client_stream_sent_total

Counter

The number of sent gRPC streaming messages on the client side, grouped by serviceName and method.

ataccama_one_http_server_exceptions_count_total

Counter

Total number of exceptions on HTTP request processing.

ataccama_one_http_server_request_seconds

Summary

The duration of incoming HTTP calls, grouped by uri, method, and response.

ataccama_one_http_server_request_seconds_count

Counter

Total number of incoming http calls, grouped by uri, method, and response.

ataccama_one_http_server_request_seconds_sum

Gauge

Total duration of incoming HTTP calls, grouped by uri, method, and response.

ataccama_one_http_server_request_seconds_max

Gauge

Maximum time spent in incoming HTTP call processing, grouped by uri, method, and response.

ataccama_one_mda_service_request_seconds_count

Counter

Total number of calls to the MDM webapp layer, grouped by method.

ataccama_one_mda_service_request_seconds_sum

Gauge

Total duration of calls to the MDM webapp layer, grouped by method.

ataccama_one_mda_service_request_seconds_max

Gauge

Maximum time spent in calls to the MDM webapp layer, grouped by method.

tomcat_sessions_alive_max_seconds

Gauge

Maximum duration of alive Tomcat session.

tomcat_sessions_active_current_sessions

Gauge

Current number of active Tomcat sessions.

jvm_buffer_total_capacity_bytes

Gauge

An estimate of the total capacity of the buffers in this pool.

jvm_gc_live_data_size_bytes

Gauge

Size of long-lived heap memory pool after reclamation.

jvm_gc_pause_seconds

Summary

Time spent in GC pause.

jvm_gc_pause_seconds_max

Gauge

Maximum time spent in GC pause.

jvm_memory_used_bytes

Gauge

The amount of memory used.

jvm_gc_memory_allocated_bytes_total

Counter

Incremented for an increase in the size of the (young) heap memory pool after one GC until the next.

process_uptime_seconds

Gauge

The uptime of the Java Virtual Machine (JVM).

process_cpu_usage

Gauge

The recent CPU usage for the JVM process.

system_cpu_count

Gauge

The number of processors available to the JVM.

RDM

For ONE RDM, the endpoint publishes the metrics as follows:

Metric Metric type Description Labels

rdm_number_of_active_sqlStatements_running

Gauge

The number of SQL statements in progress.

[]

rdm_number_of_active_methods_running

Gauge

The number of methods in progress.

[]

rdm_number_of_acquired_locks

Gauge

The number of required locks.

[]

rdm_number_of_waiting_locks

Gauge

The number of requested but not acquired locks.

[]

ataccama_one_business_ready

Gauge

The indicator of whether RDM Web App is business ready, where 0 means not ready and 1 means ready.

[]

Was this page useful?