User Community Service Desk Downloads
If you can't find the product or version you're looking for, visit support.ataccama.com/downloads

Monitoring Configuration

The following article describes how monitoring is managed in Ataccama ONE modules.

All properties mentioned in the article are supplied through the <module>/etc/application.properties file.

Monitoring endpoints and probes

For monitoring purposes, ONE exposes the following endpoints:

  • /actuator/info

  • /actuator/health

  • /actuator/health/liveness

  • /actuator/health/readiness

  • /actuator/prometheus

By default, all endpoints are enabled for each module and the authorization is disabled except for the Prometheus endpoint.

If default values are used during installation, the endpoints are available at the following URLs:

  • Metadata Management Module (MMM): http://localhost:8021

  • ONE Web Application: http://localhost:8023

  • Data Processing Module (DPM): http://localhost:8031

  • Data Processing Engine (DPE): http://localhost:8034

  • MDM Server: http://localhost:8051

  • MDM Webapp: http://localhost:8050

Info endpoint

The endpoint is enabled through the following property:

management.endpoint.info.enabled=true

The /actuator/info endpoint is used to retrieve unspecified informational content related to the application. For example, for Java modules, this covers build and Git information.

Depending on the module type, requests are expected to return the following response body structures:

  • Java modules - All current Java applications return more detailed information, specifically about the build, commits, and more. Git properties are provided to applications through a Gradle plugin.

    Response body example
    {
       "git":{
          "branch":"master",
          "commit":{
             "time":"2020-09-18T17:34:41Z",
             "message":{
                "full":"Merge branch 'release-13.0-6' [increase-minor]\n",
                "short":"Merge branch 'release-13.0-6' [increase-minor]"
             },
             "id":{
                "describe":"release-1.4.0-1-g49783b4",
                "abbrev":"49783b4",
                "full":"49783b4cd9c04a9148c309f6c33a1d4753344ef2"
             },
             "user":{
                "email":"john.smith@ataccama.com",
                "name":"john.smith"
             }
          },
          "build":{
             "version":"1.5.0-master.178-g49783b4-SNAPSHOT",
             "user":{
                "name":"ata.jenkins",
                "email":"ata.jenkins@ataccama.com"
             },
             "host":"96b5b38c19f4"
          },
          "dirty":"false",
          "tags":"",
          "total":{
             "commit":{
                "count":"178"
             }
          },
          "closest":{
             "tag":{
                "commit":{
                   "count":"1"
                },
                "name":"release-1.4.0"
             }
          },
          "remote":{
             "origin":{
                "url":"ssh://git@bitbucket.atc.services:7999/MDD/one-metadata-web-server.git"
             }
          }
       },
       "build":{
    
       }
    }
  • Python modules - In this case, the response format is implemented based on a customization.

    Response body example
    {
        "app": {
            "description": "Ataccama One 2.0 - AI Core - neighbors",
            "version": "13.0.0-rc4",
            "microservice": "neighbors"
        }
    }

Health endpoint

The endpoint is enabled through the following property:

management.endpoint.health.enabled=true

Requests made to the /actuator/health endpoint return information about the state of components and their dependencies, such as database status or disk space:

  • Java modules

    Response body - general example
    {
        "status": "UP",
        "components": {
            "context": {
                "status": "UP",
                "details": {
                    "startupDate": "2021-03-03T12:25:57.638Z"
                }
            },
            "db": {
                "status": "UP",
                "details": {
                    "database": "PostgreSQL",
                    "validationQuery": "isValid()"
                }
            },
            "diskSpace": {
                "status": "UP",
                "details": {
                    "total": 1023226937344,
                    "free": 607449665536,
                    "threshold": 10485760,
                    "exists": true
                }
            },
            "livenessState": {
                "status": "UP"
            },
            "model": {
                "status": "UP",
                "details": {
                    "modelStatus": "UP",
                    "modelVersion": 2
                }
            },
            "ping": {
                "status": "UP"
            },
            "readinessState": {
                "status": "UP"
            }
        },
        "groups": [
            "liveness",
            "readiness"
        ]
    }
    Response body - MDM-specific example
    {
       "status":"UP",
       "components":{
          "ai":{
             "status":"DOWN"
          },
          "keycloak":{
             "status":"UP"
          },
          "mmm":{
             "status":"UP"
          },
       ...
       }
    }
    The overall status remains UP even if one of the components is DOWN.

Liveness probe

The following properties configure the endpoint:

Property Data type Description

management.endpoint.health.probes.enabled

Boolean

Enables /actuator/health/liveness and /actuator/health/readiness endpoints.

Default value: true.

management.endpoint.health.group.liveness.include

String

Defines which components are covered by the liveness probe. These components are a subset of /actuator/health components.

Default value: diskSpace,ping.

Requests made to the /actuator/health/liveness endpoint are expected to return the following HTTP response codes based on the liveness status:

  • Status CORRECT: A 200 OK response indicating that the module is alive.

  • Status N/A or BROKEN: A response in the 4xx or 5xx range if the module cannot be reached or there is a failure. No response also indicates an issue with the module.

The status returned is UP only if the status of all components is UP as well.

Response body examples
Status code
{
   "status":"UP"
}
Health of dependencies
{
    "status": "UP",
    "components": {
        "diskSpace": {
            "status": "UP",
            "details": {
                "total": 1023226937344,
                "free": 607881547776,
                "threshold": 10485760,
                "exists": true
            }
        },
        "ping": {
            "status": "UP"
        }
    }
}

Readiness probe

The following properties configure the /actuator/health/readiness endpoint:

Property Data type Description

management.endpoint.health.probes.enabled

Boolean

Enables /actuator/health/liveness and /actuator/health/readiness endpoints.

Default value: true.

management.endpoint.health.group.readiness.include

String

Defines which components are covered by the readiness probe. These components are a subset of /actuator/health components.

Default value: diskSpace,ping.

Default value for DPM and DPE: db,plugins.

Default value for MMM: db,model,context.

Depending on the readiness status, the following HTTP responses are returned:

  • Status UP: 200 OK response. The status returned is UP only if the status of all components and dependencies is UP as well.

  • Status OUT_OF_SERVICE: 503 Service Unavailable response. This happens when a component or a subsystem of components are out of service and the application should therefore not accept traffic.

  • Status DOWN: 503 Service Unavailable response. This is typically caused by an unexpected failure.

  • Status UNKNOWN: 200 OK response.

Response body examples
Status code
{
   "status":"UP|OUT_OF_SERVICE|DOWN|UNKNOWN"
}
Health of dependencies
{
   "status":"UP",
   "components":{
      "db":{
         "status":"UP",
         "details":{
            "database":"PostgreSQL",
            "validationQuery":"isValid()"
         }
      }
   }
}

Prometheus endpoint

The following properties configure the /actuator/prometheus endpoint:

Property Data type Description

management.endpoint.prometheus.enabled

Boolean

Enables the /actuator/prometheus endpoint.

Default value: true.

ataccama.authentication.http.acl.endpoints.prometheus.endpoint-filter

String

Enables ACL-based authentication on the selected endpoint. The same filter can be enabled on other endpoints.

Default value: /actuator/prometheus.

ataccama.authentication.http.acl.endpoints.prometheus.allowed-roles

String

Allows access to the endpoint defined in the endpoint-filter property for the selected user roles.

Default value: ONE_PLATFORM_MONITORING.

By default, the endpoint is secured using HTTP basic authentication and only the monitoring role is allowed to communicate with it. The only defined response status code is 200 OK.

Requests made to the Prometheus endpoint return application metrics with relevant information about the running module. Currently, only general processing information is provided, such as RAM and CPU usage. However, it is possible to provide specific runtime information for each ONE module as well.

For an overview of module-specific monitoring metrics, see Monitoring Metrics.

Depending on how the module is implemented, you can expect the following response body structure:

  • Java modules - Java modules use metrics provided by Micrometer, a tool that is automatically integrated with Spring Boot Actuator.

    Response body example
    Status code
    # HELP jvm_gc_max_data_size_bytes Max size of old generation memory pool
    # TYPE jvm_gc_max_data_size_bytes gauge
    jvm_gc_max_data_size_bytes 1.44048128E8
    # HELP process_cpu_usage The "recent cpu usage" for the Java Virtual Machine process
    # TYPE process_cpu_usage gauge
    process_cpu_usage 0.001402737059119283
    # HELP jvm_buffer_count_buffers An estimate of the number of buffers in the pool
    # TYPE jvm_buffer_count_buffers gauge
    jvm_buffer_count_buffers{id="mapped",} 0.0
    jvm_buffer_count_buffers{id="direct",} 10.0
    # HELP tomcat_sessions_active_current_sessions
    # TYPE tomcat_sessions_active_current_sessions gauge
    tomcat_sessions_active_current_sessions 0.0
    # HELP tomcat_sessions_rejected_sessions_total
    # TYPE tomcat_sessions_rejected_sessions_total counter
    tomcat_sessions_rejected_sessions_total 0.0
    # HELP jvm_memory_committed_bytes The amount of memory in bytes that is committed for the Java virtual machine to use
    # TYPE jvm_memory_committed_bytes gauge
    jvm_memory_committed_bytes{area="heap",id="Tenured Gen",} 3.2444416E7
    jvm_memory_committed_bytes{area="nonheap",id="CodeHeap 'profiled nmethods'",} 1.80224E7
    jvm_memory_committed_bytes{area="heap",id="Eden Space",} 1.31072E7
    jvm_memory_committed_bytes{area="nonheap",id="Metaspace",} 6.6060288E7
    jvm_memory_committed_bytes{area="nonheap",id="CodeHeap 'non-nmethods'",} 2555904.0
    jvm_memory_committed_bytes{area="heap",id="Survivor Space",} 1572864.0
    jvm_memory_committed_bytes{area="nonheap",id="Compressed Class Space",} 8388608.0
    jvm_memory_committed_bytes{area="nonheap",id="CodeHeap 'non-profiled nmethods'",} 4980736.0
    # HELP system_cpu_count The number of processors available to the Java virtual machine
    # TYPE system_cpu_count gauge
    system_cpu_count 1.0
    # HELP process_start_time_seconds Start time of the process since unix epoch.
    # TYPE process_start_time_seconds gauge
    process_start_time_seconds 1.602080758644E9
    # HELP logback_events_total Number of error level events that made it to the logs
    # TYPE logback_events_total counter
    logback_events_total{level="warn",} 2.0
    logback_events_total{level="debug",} 0.0
    logback_events_total{level="error",} 1.0
    logback_events_total{level="trace",} 0.0
    logback_events_total{level="info",} 73.0
    # HELP tomcat_sessions_created_sessions_total
    # TYPE tomcat_sessions_created_sessions_total counter
    tomcat_sessions_created_sessions_total 0.0
    # HELP jvm_threads_states_threads The current number of threads having NEW state
    # TYPE jvm_threads_states_threads gauge
    jvm_threads_states_threads{state="runnable",} 7.0
    jvm_threads_states_threads{state="blocked",} 0.0
    jvm_threads_states_threads{state="waiting",} 18.0
    jvm_threads_states_threads{state="timed-waiting",} 6.0
    jvm_threads_states_threads{state="new",} 0.0
    jvm_threads_states_threads{state="terminated",} 0.0
    # HELP process_files_max_files The maximum file descriptor count
    # TYPE process_files_max_files gauge
    process_files_max_files 1048576.0
    # HELP tomcat_sessions_active_max_sessions
    # TYPE tomcat_sessions_active_max_sessions gauge
    tomcat_sessions_active_max_sessions 0.0
    # HELP jvm_gc_live_data_size_bytes Size of old generation memory pool after a full GC
    # TYPE jvm_gc_live_data_size_bytes gauge
    jvm_gc_live_data_size_bytes 1.9466048E7
    # HELP jvm_classes_loaded_classes The number of classes that are currently loaded in the Java virtual machine
    # TYPE jvm_classes_loaded_classes gauge
    jvm_classes_loaded_classes 12389.0
    # HELP jvm_classes_unloaded_classes_total The total number of classes unloaded since the Java virtual machine has started execution
    # TYPE jvm_classes_unloaded_classes_total counter
    jvm_classes_unloaded_classes_total 11.0
    # HELP jvm_gc_pause_seconds Time spent in GC pause
    # TYPE jvm_gc_pause_seconds summary
    jvm_gc_pause_seconds_count{action="end of major GC",cause="Allocation Failure",} 1.0
    jvm_gc_pause_seconds_sum{action="end of major GC",cause="Allocation Failure",} 0.067
    jvm_gc_pause_seconds_count{action="end of minor GC",cause="Allocation Failure",} 32.0
    jvm_gc_pause_seconds_sum{action="end of minor GC",cause="Allocation Failure",} 0.356
    # HELP jvm_gc_pause_seconds_max Time spent in GC pause
    # TYPE jvm_gc_pause_seconds_max gauge
    jvm_gc_pause_seconds_max{action="end of major GC",cause="Allocation Failure",} 0.0
    jvm_gc_pause_seconds_max{action="end of minor GC",cause="Allocation Failure",} 0.0
    # HELP jvm_memory_used_bytes The amount of used memory
    # TYPE jvm_memory_used_bytes gauge
    jvm_memory_used_bytes{area="heap",id="Tenured Gen",} 2.7371576E7
    jvm_memory_used_bytes{area="nonheap",id="CodeHeap 'profiled nmethods'",} 1.7684864E7
    jvm_memory_used_bytes{area="heap",id="Eden Space",} 1.0807632E7
    jvm_memory_used_bytes{area="nonheap",id="Metaspace",} 6.3984912E7
    jvm_memory_used_bytes{area="nonheap",id="CodeHeap 'non-nmethods'",} 1290880.0
    jvm_memory_used_bytes{area="heap",id="Survivor Space",} 64784.0
    jvm_memory_used_bytes{area="nonheap",id="Compressed Class Space",} 7654504.0
    jvm_memory_used_bytes{area="nonheap",id="CodeHeap 'non-profiled nmethods'",} 4619392.0
    # HELP jvm_threads_peak_threads The peak live thread count since the Java virtual machine started or peak was reset
    # TYPE jvm_threads_peak_threads gauge
    jvm_threads_peak_threads 31.0
    # HELP process_files_open_files The open file descriptor count
    # TYPE process_files_open_files gauge
    process_files_open_files 171.0
    # HELP system_load_average_1m The sum of the number of runnable entities queued to available processors and the number of runnable entities running on the available processors averaged over a period of time
    # TYPE system_load_average_1m gauge
    system_load_average_1m 1.07
    # HELP system_cpu_usage The "recent cpu usage" for the whole system
    # TYPE system_cpu_usage gauge
    system_cpu_usage 0.198286866126692
    # HELP jvm_memory_max_bytes The maximum amount of memory in bytes that can be used for memory management
    # TYPE jvm_memory_max_bytes gauge
    jvm_memory_max_bytes{area="heap",id="Tenured Gen",} 1.44048128E8
    jvm_memory_max_bytes{area="nonheap",id="CodeHeap 'profiled nmethods'",} 1.22912768E8
    jvm_memory_max_bytes{area="heap",id="Eden Space",} 5.767168E7
    jvm_memory_max_bytes{area="nonheap",id="Metaspace",} -1.0
    jvm_memory_max_bytes{area="nonheap",id="CodeHeap 'non-nmethods'",} 5828608.0
    jvm_memory_max_bytes{area="heap",id="Survivor Space",} 7143424.0
    jvm_memory_max_bytes{area="nonheap",id="Compressed Class Space",} 1.073741824E9
    jvm_memory_max_bytes{area="nonheap",id="CodeHeap 'non-profiled nmethods'",} 1.22916864E8
    # HELP jvm_threads_daemon_threads The current number of live daemon threads
    # TYPE jvm_threads_daemon_threads gauge
    jvm_threads_daemon_threads 27.0
    # HELP jvm_threads_live_threads The current number of live threads including both daemon and non-daemon threads
    # TYPE jvm_threads_live_threads gauge
    jvm_threads_live_threads 31.0
    # HELP tomcat_sessions_expired_sessions_total
    # TYPE tomcat_sessions_expired_sessions_total counter
    tomcat_sessions_expired_sessions_total 0.0
    # HELP jvm_gc_memory_allocated_bytes_total Incremented for an increase in the size of the young generation memory pool after one GC to before the next
    # TYPE jvm_gc_memory_allocated_bytes_total counter
    jvm_gc_memory_allocated_bytes_total 4.01069008E8
    # HELP tomcat_sessions_alive_max_seconds
    # TYPE tomcat_sessions_alive_max_seconds gauge
    tomcat_sessions_alive_max_seconds 0.0
    # HELP process_uptime_seconds The uptime of the Java virtual machine
    # TYPE process_uptime_seconds gauge
    process_uptime_seconds 20930.109
    # HELP jvm_gc_memory_promoted_bytes_total Count of positive increases in the size of the old generation memory pool before GC to after GC
    # TYPE jvm_gc_memory_promoted_bytes_total counter
    jvm_gc_memory_promoted_bytes_total 1.0545424E7
    # HELP jvm_buffer_total_capacity_bytes An estimate of the total capacity of the buffers in this pool
    # TYPE jvm_buffer_total_capacity_bytes gauge
    jvm_buffer_total_capacity_bytes{id="mapped",} 0.0
    jvm_buffer_total_capacity_bytes{id="direct",} 81920.0
    # HELP jvm_buffer_memory_used_bytes An estimate of the memory that the Java virtual machine is using for this buffer pool
    # TYPE jvm_buffer_memory_used_bytes gauge
    jvm_buffer_memory_used_bytes{id="mapped",} 0.0
    jvm_buffer_memory_used_bytes{id="direct",} 81920.0
  • Python modules - In this case, the endpoint collects the output of all metric collectors provided by the Prometheus Client library (ProcessCollector, PlatformCollector, and GCCollector), including Ataccama’s placeholder info metric called microservice, which holds the name of the microservice.

    Response body example
    Status code
    # HELP process_virtual_memory_bytes Virtual memory size in bytes.
    # TYPE process_virtual_memory_bytes gauge
    process_virtual_memory_bytes 1.479733248e+09
    # HELP process_resident_memory_bytes Resident memory size in bytes.
    # TYPE process_resident_memory_bytes gauge
    process_resident_memory_bytes 1.76336896e+08
    # HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
    # TYPE process_start_time_seconds gauge
    process_start_time_seconds 1.60447759201e+09
    # HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
    # TYPE process_cpu_seconds_total counter
    process_cpu_seconds_total 1.7400000000000002
    # HELP process_open_fds Number of open file descriptors.
    # TYPE process_open_fds gauge
    process_open_fds 29.0
    # HELP process_max_fds Maximum number of open file descriptors.
    # TYPE process_max_fds gauge
    process_max_fds 1024.0
    # HELP python_info Python platform information
    # TYPE python_info gauge
    python_info{implementation="CPython",major="3",minor="8",patchlevel="5",version="3.8.5"} 1.0
    # HELP python_gc_objects_collected_total Objects collected during gc
    # TYPE python_gc_objects_collected_total counter
    python_gc_objects_collected_total{generation="0"} 9612.0
    python_gc_objects_collected_total{generation="1"} 1624.0
    python_gc_objects_collected_total{generation="2"} 6.0
    # HELP python_gc_objects_uncollectable_total Uncollectable object found during GC
    # TYPE python_gc_objects_uncollectable_total counter
    python_gc_objects_uncollectable_total{generation="0"} 0.0
    python_gc_objects_uncollectable_total{generation="1"} 0.0
    python_gc_objects_uncollectable_total{generation="2"} 0.0
    # HELP python_gc_collections_total Number of times this generation was collected
    # TYPE python_gc_collections_total counter
    python_gc_collections_total{generation="0"} 471.0
    python_gc_collections_total{generation="1"} 42.0
    python_gc_collections_total{generation="2"} 3.0
    # HELP microservice_info Details of the microservice
    # TYPE microservice_info gauge
    microservice_info{name="cli_microservice"} 1.0

DPM and DPE health check metrics

Using the following properties you can expose specific module’s health checks as monitoring metrics.

DPM

In on-premise deployments, these properties are provided in the dpm/etc/application.properties file.

Property Data type Description

ataccama.one.dpm.health-checks.expose-as-metrics

Boolean

Enables exposing DPM health checks as metrics.

Default value: false.

management.endpoint.health.mmm-be.enabled

Boolean

Enables MMM health checks.

Default value: false.

DPE

In on-premise deployments, these properties are provided in the dpe/etc/application.properties file.

Property Data type Description

ataccama.one.dpe.health-checks.expose-as-metrics

Boolean

Enables exposing DPE health checks as metrics.

Default value: true.

management.endpoint.health.dpm.enabled

Boolean

Enables DPM health checks.

Default value: true.

MMM metrics

In on-premise deployments, these properties are provided in the mmm-backend/etc/application.properties file.

Active users metric

The following properties configure how active user sessions in MMM are monitored.

Property Data type Description

ataccama.one.mmm.active-users.enabled

Boolean

Exposes the Prometheus metric (ataccama_one_mmm_activeusers) for monitoring active users. If disabled, the metric is not exposed and there is no memory footprint.

Enabled by default.

Default value: true.

ataccama.one.mmm.active-users.url

String

A comma-separated list of URLs where the user metric is active. For example, calling the /actuator endpoint is not counted as an active user.

Default value: /graphql.

ataccama.one.mmm.active-users.max-cache-size

Number

Sets the maximum number of entries in the active users cache.

Default value: 1000.

ataccama.one.mmm.active-users.user-idle-time

String

Configures the maximum period elapsed between subsequent calls made by an active user. If no calls are made during this period, the user is no longer counted as active.

Default value: 15m. For a full list of accepted units, see Duration units.

ataccama.one.mmm.active-users.excluded-users

String

A comma-separated list of users that are excluded from the active user count (for example, monitoring). By default, the list is empty.

Health check metrics

Property Data type Description

management.endpoint.health.one-object-storage.enabled

Boolean

Enables exposing ONE Object Storage health checks as metrics.

Default value: ${management.endpoint.health.all-custom.enabled}.

ataccama.one.mmm.health-checks.one-object-storage.state-validity-timeout

String

The period of time during which an internal health check state remains valid. Applies to ONE Object Storage health checks.

Default value: 2m. For a full list of accepted units, see Duration units.

ataccama.one.mmm.health-checks.one-object-storage.period

String

The schedule based on which the health check state is fetched. Applies to ONE Object Storage health checks.

Default value: 30s. For a full list of accepted units, see Duration units.

ataccama.one.mmm.health-checks.one-object-storage.execution-timeout

String

The period of time after which an attempt to fetch the health check state times out. Applies to ONE Object Storage health checks.

Default value: 5m. For a full list of accepted units, see Duration units.

management.endpoint.health.dpm.enabled

Boolean

Enables exposing DPM health checks as metrics.

Default value: ${management.endpoint.health.all-custom.enabled}.

ataccama.one.mmm.health-checks.dpm.state-validity-timeout

String

The period of time during which an internal health check state remains valid. Applies to DPM health checks.

Default value: 2m. For a full list of accepted units, see Duration units.

ataccama.one.mmm.health-checks.dpm.period

String

The schedule based on which the health check state is fetched. Applies to DPM health checks.

Default value: 30s. For a full list of accepted units, see Duration units.

ataccama.one.mmm.health-checks.dpm.execution-timeout

String

The period of time after which an attempt to fetch the health check state times out. Applies to DPM health checks.

Default value: 5m.

management.endpoint.health.ai-core-anomaly.enabled

Boolean

Enables exposing Anomaly Detection microservice health checks as metrics.

Default value: ${management.endpoint.health.all-custom.enabled}.

ataccama.one.mmm.health-checks.ai-core-anomaly.state-validity-timeout

String

The period of time during which an internal health check state remains valid. Applies to Anomaly Detection microservice health checks.

Default value: 2m. For a full list of accepted units, see Duration units.

ataccama.one.mmm.health-checks.ai-core-anomaly.period

String

The schedule based on which the health check state is fetched. Applies to Anomaly Detection microservice health checks.

Default value: 30s. For a full list of accepted units, see Duration units.

ataccama.one.mmm.health-checks.ai-core-anomaly.execution-timeout

String

The period of time after which an attempt to fetch the health check state times out. Applies to Anomaly Detection microservice health checks.

Default value: 5m. For a full list of accepted units, see Duration units.

management.endpoint.health.ai-core-autocomplete.enabled

Boolean

Enables exposing Autocomplete microservice health checks as metrics.

Default value: false.

ataccama.one.mmm.health-checks.ai-core-autocomplete.state-validity-timeout

String

The period of time during which an internal health check state remains valid. Applies to Autocomplete microservice health checks.

Default value: 2m. For a full list of accepted units, see Duration units.

ataccama.one.mmm.health-checks.ai-core-autocomplete.period

String

The schedule based on which the health check state is fetched. Applies to Autocomplete microservice health checks.

Default value: 30s. For a full list of accepted units, see Duration units.

ataccama.one.mmm.health-checks.ai-core-autocomplete.execution-timeout

String

The period of time after which an attempt to fetch the health check state times out. Applies to Autocomplete microservice health checks.

Default value: 5m.

management.endpoint.health.ai-core-spell-checker.enabled

Boolean

Enables exposing Spellchecker microservice health checks as metrics.

Default value: false.

ataccama.one.mmm.health-checks.ai-core-spell-checker.state-validity-timeout

String

The period of time during which an internal health check state remains valid. Applies to Spellchecker microservice health checks.

Default value: 2m. For a full list of accepted units, see Duration units.

ataccama.one.mmm.health-checks.ai-core-spell-checker.period

String

The schedule based on which the health check state is fetched. Applies to Spellchecker microservice health checks.

Default value: 30s. For a full list of accepted units, see Duration units.

ataccama.one.mmm.health-checks.ai-core-spell-checker.execution-timeout

String

The period of time after which an attempt to fetch the health check state times out. Applies to Spellchecker microservice health checks.

Default value: 5m. For a full list of accepted units, see Duration units.

management.endpoint.health.ai-core-translator.enabled

Boolean

Enables exposing Translator microservice health checks as metrics.

Default value: false.

ataccama.one.mmm.health-checks.ai-core-translator.state-validity-timeout

String

The period of time during which an internal health check state remains valid. Applies to Translator microservice health checks.

Default value: 2m. For a full list of accepted units, see Duration units.

ataccama.one.mmm.health-checks.ai-core-translator.period

String

The schedule based on which the health check state is fetched. Applies to Translator microservice health checks.

Default value: 30s.

ataccama.one.mmm.health-checks.ai-core-translator.execution-timeout

String

The period of time after which an attempt to fetch the health check state times out. Applies to Translator microservice health checks.

Default value: 5m. For a full list of accepted units, see Duration units.

management.endpoint.health.quartz-scheduler.enabled

Boolean

Enables exposing Quartz Scheduler health checks as metrics.

Default value: ${management.endpoint.health.all-custom.enabled}.

management.endpoint.health.quartz-scheduler.health-check-job-interval

String

Defines how often Quartz Scheduler health check jobs run.

Default value: 60s. For a full list of accepted units, see Duration units.

management.endpoint.health.quartz-scheduler.max-age

String

Configures the minimum period for which a health check job for Quartz Scheduler should be running. If a health check job ends before that, it is considered that Scheduler is no longer running either.

Must be greater than quartz-scheduler.health-check-job-interval.

Default value: 120s. For a full list of accepted units, see Duration units. `

management.endpoint.health.all-custom.enabled

Boolean

Sets the default value of all management.endpoint.health.*.enabled properties. Used for enabling all health checks at once.

Default value: false.

ataccama.one.mmm.health-checks.expose-as-metrics

Boolean

Enables exposing MMM health checks as metrics.

Default value: false.

ataccama.one.mmm.health-checks.scheduler.pool.size

Number

The number of threads used for the stateful health indicator scheduler.

Default value: 2.

ataccama.one.mmm.health-checks.default.state-validity-timeout

String

The period of time during which an internal health check state remains valid.

Default value: 2m. For a full list of accepted units, see Duration units.

ataccama.one.mmm.health-checks.default.period

String

The schedule based on which the health check state is fetched.

Default value: 30s. For a full list of accepted units, see Duration units.

ataccama.one.mmm.health-checks.default.execution-timeout

String

The period of time after which an attempt to fetch the health check state times out.

Default value: 5m.

management.endpoint.health.plugin-smtp-server.enabled

Boolean

Enables exposing the SMTP server plugin health checks as metrics.

Default value: false.

ataccama.one.mmm.health-checks.plugin-smtp-server.state-validity-timeout

String

The period of time during which an internal health check state remains valid. Applies to the SMTP server plugin health checks.

Default value: 2m. For a full list of accepted units, see Duration units.

ataccama.one.mmm.health-checks.plugin-smtp-server.period

String

The schedule based on which the health check state is fetched. Applies to the SMTP server plugin health checks.

Default value: 30s. For a full list of accepted units, see Duration units.

ataccama.one.mmm.health-checks.plugin-smtp-server.execution-timeout

String

The period of time after which an attempt to fetch the health check state times out. Applies to the SMTP server plugin health checks.

Default value: 5m. For a full list of accepted units, see Duration units.

Additional settings

The following properties are listed for information purposes only and should not be modified.
Property Data type Description

management.endpoints.enabled-by-default

Boolean

Enables all actuator endpoints. If set to false, it is possible to configure individually which endpoints are enabled.

Default value: false.

management.endpoints.web.exposure.include

String

A comma-separated list of exposed actuator endpoints that should provide information about the application.

Default value: health,info,prometheus.

These endpoints track the following:

  • health - The health status of the application.

  • info - Other information about the application.

  • prometheus - Provides all metrics from the application in a format that Prometheus can scrape.

management.endpoint.health.status.order

String

A comma-separated list that determines how the /actuator/health monitoring endpoint prioritizes application health statuses.

Default value: down,out-of-service,unknown,up.

management.info.git.mode

String

Configures how much information the /actuator/info monitoring endpoint retrieves from Git about the application source code repository.

To show all available information from the git.properties file, set the value to full. To display only basic information, such as the name of the branch, the commit identifier, and the time the commit was made, set the value to simple.

Default value: full.

management.endpoint.health.show-details

String

Specifies how much information is provided by the health monitoring endpoint.

The following values are available:

  • never - Health details are never displayed to any user.

  • when-authorized - Only authorized users have access to health information.

  • always - All users can see health details.

Default value: always.

management.endpoint.health.show-components

String

Specifies how much detail the health monitoring endpoint provides about the application components. You can also define which components are shown.

The following values are available:

  • never - Component information is never displayed to any user.

  • when-authorized - Only authorized users have access to information about components.

  • always - All users can see component details.

Default value: always.

management.metrics.web.server.request.autotime.enabled

Boolean

Refers to the instrumentation of all Spring Boot endpoints. If set to true, all endpoints are monitored and the related time metrics are obtained.

Default value: false.

ataccama.one.dpm.monitoring.resource-allocation.datasource.max-size

Number

Specifies the maximum of metrics entries to be used for data sources.

Default value: 1024.

ataccama.one.dpm.monitoring.resource-allocation.datasource.expire-after-duration

String

Specifies how long the data source metrics are kept in memory before they are erased.

Default value: 60s.

Application lifecycle examples

The following tables provide an overview of application statuses and expected HTTP response codes during each phase of the application lifecycle.

  • Startup

    Name Liveness status Liveness HTTP response code Readiness status Readiness HTTP response code

    Starting

    N/A or BROKEN

    N/A or 4xx or 5xx

    REFUSING_TRAFFIC

    503 Service Unavailable

    Started

    CORRECT

    200 OK

    REFUSING_TRAFFIC

    503 Service Unavailable

    Ready

    CORRECT

    200 OK

    ACCEPTING_TRAFFIC

    200 OK

  • Failure

    Name Liveness status Liveness HTTP response code Readiness status Readiness HTTP response code

    Ready

    CORRECT

    200 OK

    ACCEPTING_TRAFFIC

    200 OK

    Not ready

    CORRECT

    200 OK

    REFUSING_TRAFFIC

    503 Service Unavailable

    Ready

    CORRECT

    200 OK

    ACCEPTING_TRAFFIC

    200 OK

  • Shutdown

    Name Liveness status Liveness HTTP response code Readiness status Readiness HTTP response code

    Running

    CORRECT

    200 OK

    ACCEPTING_TRAFFIC

    200 OK

    Graceful shutdown

    CORRECT

    200 OK

    REFUSING_TRAFFIC

    503 Service Unavailable

    Shutdown complete

    N/A

    N/A

    N/A

    N/A

Accepted units

Duration

Accepted units for time duration are as follows:

  • ns (nanoseconds)

  • us (microseconds)

  • ms (milliseconds)

  • s (seconds)

  • m (minutes)

  • h (hours)

  • d (days)

Was this page useful?