Monitoring Configuration
The following article describes how monitoring is managed in Ataccama ONE modules.
All properties mentioned in the article are supplied through the Configuration Service, in the corresponding deployment of the module, or the <module>/etc/application.properties file.
|
Monitoring endpoints and probes
For monitoring purposes, ONE exposes the following endpoints:
-
/actuator/info
-
/actuator/health
-
/actuator/health/liveness
-
/actuator/health/readiness
-
/actuator/prometheus
By default, all endpoints are enabled for each module and the authorization is disabled except for the Prometheus endpoint.
If default values are used during installation, the endpoints are available at the following URLs:
-
Configuration Service:
localhost:8011
-
Metadata Management Module (MMM):
http://localhost:8021
-
ONE Web Application:
http://localhost:8023
-
Data Processing Module (DPM):
http://localhost:8031
-
Data Processing Engine (DPE):
http://localhost:8034
-
AI Core:
http://localhost:8041
Info endpoint
The endpoint is enabled through the following property:
management.endpoint.info.enabled=true
The /actuator/info
endpoint is used to retrieve unspecified informational content related to the application.
For example, for Java modules, this covers build and Git information.
Depending on the module type, requests are expected to return the following response body structures:
-
Java modules - All current Java applications return more detailed information, specifically about the build, commits, and more. Git properties are provided to applications through a Gradle plugin.
Response body example
{ "git":{ "branch":"master", "commit":{ "time":"2020-09-18T17:34:41Z", "message":{ "full":"Merge branch 'release-13.0-6' [increase-minor]\n", "short":"Merge branch 'release-13.0-6' [increase-minor]" }, "id":{ "describe":"release-1.4.0-1-g49783b4", "abbrev":"49783b4", "full":"49783b4cd9c04a9148c309f6c33a1d4753344ef2" }, "user":{ "email":"john.smith@ataccama.com", "name":"john.smith" } }, "build":{ "version":"1.5.0-master.178-g49783b4-SNAPSHOT", "user":{ "name":"ata.jenkins", "email":"ata.jenkins@ataccama.com" }, "host":"96b5b38c19f4" }, "dirty":"false", "tags":"", "total":{ "commit":{ "count":"178" } }, "closest":{ "tag":{ "commit":{ "count":"1" }, "name":"release-1.4.0" } }, "remote":{ "origin":{ "url":"ssh://git@bitbucket.atc.services:7999/MDD/one-metadata-web-server.git" } } }, "build":{ } }
-
Python modules - In this case, the response format is implemented based on a customization.
Response body example
{ "app": { "description": "Ataccama One 2.0 - AI Core - neighbors", "version": "13.0.0-rc4", "microservice": "neighbors" } }
Health endpoint
The endpoint is enabled through the following property:
management.endpoint.health.enabled=true
Requests made to the /actuator/health
endpoint return information about the state of components and their dependencies, such as database status or disk space:
-
Java modules
Response body - general example
{ "status": "UP", "components": { "context": { "status": "UP", "details": { "startupDate": "2021-03-03T12:25:57.638Z" } }, "db": { "status": "UP", "details": { "database": "PostgreSQL", "validationQuery": "isValid()" } }, "diskSpace": { "status": "UP", "details": { "total": 1023226937344, "free": 607449665536, "threshold": 10485760, "exists": true } }, "livenessState": { "status": "UP" }, "model": { "status": "UP", "details": { "modelStatus": "UP", "modelVersion": 2 } }, "ping": { "status": "UP" }, "readinessState": { "status": "UP" } }, "groups": [ "liveness", "readiness" ] }
Liveness probe
The following properties configure the endpoint:
Property | Data type | Description |
---|---|---|
|
Boolean |
Enables Default value: |
|
String |
Defines which components are covered by the liveness probe.
These components are a subset of Default value: |
Requests made to the /actuator/health/liveness
endpoint are expected to return the following HTTP response codes based on the liveness status:
-
Status
CORRECT
: A200 OK
response indicating that the module is alive. -
Status
N/A
orBROKEN
: A response in the4xx
or5xx
range if the module cannot be reached or there is a failure. No response also indicates an issue with the module.
The status returned is UP
only if the status of all components is UP
as well.
Response body examples
{
"status":"UP"
}
{
"status": "UP",
"components": {
"diskSpace": {
"status": "UP",
"details": {
"total": 1023226937344,
"free": 607881547776,
"threshold": 10485760,
"exists": true
}
},
"ping": {
"status": "UP"
}
}
}
Readiness probe
The following properties configure the /actuator/health/readiness
endpoint:
Property | Data type | Description |
---|---|---|
|
Boolean |
Enables Default value: |
|
String |
Defines which components are covered by the readiness probe.
These components are a subset of Default value: Default value for DPM and DPE: Default value for MMM: |
Depending on the readiness status, the following HTTP responses are returned:
-
Status
UP
:200 OK
response. The status returned isUP
only if the status of all components and dependencies isUP
as well. -
Status
OUT_OF_SERVICE
:503 Service Unavailable
response. This happens when a component or a subsystem of components are out of service and the application should therefore not accept traffic. -
Status
DOWN
:503 Service Unavailable
response. This is typically caused by an unexpected failure. -
Status
UNKNOWN
:200 OK
response.
Response body examples
{
"status":"UP|OUT_OF_SERVICE|DOWN|UNKNOWN"
}
{
"status":"UP",
"components":{
"db":{
"status":"UP",
"details":{
"database":"PostgreSQL",
"validationQuery":"isValid()"
}
}
}
}
Prometheus endpoint
The following properties configure the /actuator/prometheus
endpoint:
Property | Data type | Description |
---|---|---|
|
Boolean |
Enables the Default value: |
|
String |
Enables ACL-based authentication on the selected endpoint. The same filter can be enabled on other endpoints. Default value: |
|
String |
Allows access to the endpoint defined in the endpoint-filter property for the selected user roles. Default value: |
By default, the endpoint is secured using HTTP basic authentication and only the monitoring role is allowed to communicate with it.
The only defined response status code is 200 OK
.
Requests made to the Prometheus endpoint return application metrics with relevant information about the running module. Currently, only general processing information is provided, such as RAM and CPU usage. However, it is possible to provide specific runtime information for each ONE module as well.
For an overview of module-specific monitoring metrics, see Monitoring Metrics. |
Depending on how the module is implemented, you can expect the following response body structure:
-
Java modules - Java modules use metrics provided by Micrometer, a tool that is automatically integrated with Spring Boot Actuator.
Response body example
Status code# HELP jvm_gc_max_data_size_bytes Max size of old generation memory pool # TYPE jvm_gc_max_data_size_bytes gauge jvm_gc_max_data_size_bytes 1.44048128E8 # HELP process_cpu_usage The "recent cpu usage" for the Java Virtual Machine process # TYPE process_cpu_usage gauge process_cpu_usage 0.001402737059119283 # HELP jvm_buffer_count_buffers An estimate of the number of buffers in the pool # TYPE jvm_buffer_count_buffers gauge jvm_buffer_count_buffers{id="mapped",} 0.0 jvm_buffer_count_buffers{id="direct",} 10.0 # HELP tomcat_sessions_active_current_sessions # TYPE tomcat_sessions_active_current_sessions gauge tomcat_sessions_active_current_sessions 0.0 # HELP tomcat_sessions_rejected_sessions_total # TYPE tomcat_sessions_rejected_sessions_total counter tomcat_sessions_rejected_sessions_total 0.0 # HELP jvm_memory_committed_bytes The amount of memory in bytes that is committed for the Java virtual machine to use # TYPE jvm_memory_committed_bytes gauge jvm_memory_committed_bytes{area="heap",id="Tenured Gen",} 3.2444416E7 jvm_memory_committed_bytes{area="nonheap",id="CodeHeap 'profiled nmethods'",} 1.80224E7 jvm_memory_committed_bytes{area="heap",id="Eden Space",} 1.31072E7 jvm_memory_committed_bytes{area="nonheap",id="Metaspace",} 6.6060288E7 jvm_memory_committed_bytes{area="nonheap",id="CodeHeap 'non-nmethods'",} 2555904.0 jvm_memory_committed_bytes{area="heap",id="Survivor Space",} 1572864.0 jvm_memory_committed_bytes{area="nonheap",id="Compressed Class Space",} 8388608.0 jvm_memory_committed_bytes{area="nonheap",id="CodeHeap 'non-profiled nmethods'",} 4980736.0 # HELP system_cpu_count The number of processors available to the Java virtual machine # TYPE system_cpu_count gauge system_cpu_count 1.0 # HELP process_start_time_seconds Start time of the process since unix epoch. # TYPE process_start_time_seconds gauge process_start_time_seconds 1.602080758644E9 # HELP logback_events_total Number of error level events that made it to the logs # TYPE logback_events_total counter logback_events_total{level="warn",} 2.0 logback_events_total{level="debug",} 0.0 logback_events_total{level="error",} 1.0 logback_events_total{level="trace",} 0.0 logback_events_total{level="info",} 73.0 # HELP tomcat_sessions_created_sessions_total # TYPE tomcat_sessions_created_sessions_total counter tomcat_sessions_created_sessions_total 0.0 # HELP jvm_threads_states_threads The current number of threads having NEW state # TYPE jvm_threads_states_threads gauge jvm_threads_states_threads{state="runnable",} 7.0 jvm_threads_states_threads{state="blocked",} 0.0 jvm_threads_states_threads{state="waiting",} 18.0 jvm_threads_states_threads{state="timed-waiting",} 6.0 jvm_threads_states_threads{state="new",} 0.0 jvm_threads_states_threads{state="terminated",} 0.0 # HELP process_files_max_files The maximum file descriptor count # TYPE process_files_max_files gauge process_files_max_files 1048576.0 # HELP tomcat_sessions_active_max_sessions # TYPE tomcat_sessions_active_max_sessions gauge tomcat_sessions_active_max_sessions 0.0 # HELP jvm_gc_live_data_size_bytes Size of old generation memory pool after a full GC # TYPE jvm_gc_live_data_size_bytes gauge jvm_gc_live_data_size_bytes 1.9466048E7 # HELP jvm_classes_loaded_classes The number of classes that are currently loaded in the Java virtual machine # TYPE jvm_classes_loaded_classes gauge jvm_classes_loaded_classes 12389.0 # HELP jvm_classes_unloaded_classes_total The total number of classes unloaded since the Java virtual machine has started execution # TYPE jvm_classes_unloaded_classes_total counter jvm_classes_unloaded_classes_total 11.0 # HELP jvm_gc_pause_seconds Time spent in GC pause # TYPE jvm_gc_pause_seconds summary jvm_gc_pause_seconds_count{action="end of major GC",cause="Allocation Failure",} 1.0 jvm_gc_pause_seconds_sum{action="end of major GC",cause="Allocation Failure",} 0.067 jvm_gc_pause_seconds_count{action="end of minor GC",cause="Allocation Failure",} 32.0 jvm_gc_pause_seconds_sum{action="end of minor GC",cause="Allocation Failure",} 0.356 # HELP jvm_gc_pause_seconds_max Time spent in GC pause # TYPE jvm_gc_pause_seconds_max gauge jvm_gc_pause_seconds_max{action="end of major GC",cause="Allocation Failure",} 0.0 jvm_gc_pause_seconds_max{action="end of minor GC",cause="Allocation Failure",} 0.0 # HELP jvm_memory_used_bytes The amount of used memory # TYPE jvm_memory_used_bytes gauge jvm_memory_used_bytes{area="heap",id="Tenured Gen",} 2.7371576E7 jvm_memory_used_bytes{area="nonheap",id="CodeHeap 'profiled nmethods'",} 1.7684864E7 jvm_memory_used_bytes{area="heap",id="Eden Space",} 1.0807632E7 jvm_memory_used_bytes{area="nonheap",id="Metaspace",} 6.3984912E7 jvm_memory_used_bytes{area="nonheap",id="CodeHeap 'non-nmethods'",} 1290880.0 jvm_memory_used_bytes{area="heap",id="Survivor Space",} 64784.0 jvm_memory_used_bytes{area="nonheap",id="Compressed Class Space",} 7654504.0 jvm_memory_used_bytes{area="nonheap",id="CodeHeap 'non-profiled nmethods'",} 4619392.0 # HELP jvm_threads_peak_threads The peak live thread count since the Java virtual machine started or peak was reset # TYPE jvm_threads_peak_threads gauge jvm_threads_peak_threads 31.0 # HELP process_files_open_files The open file descriptor count # TYPE process_files_open_files gauge process_files_open_files 171.0 # HELP system_load_average_1m The sum of the number of runnable entities queued to available processors and the number of runnable entities running on the available processors averaged over a period of time # TYPE system_load_average_1m gauge system_load_average_1m 1.07 # HELP system_cpu_usage The "recent cpu usage" for the whole system # TYPE system_cpu_usage gauge system_cpu_usage 0.198286866126692 # HELP jvm_memory_max_bytes The maximum amount of memory in bytes that can be used for memory management # TYPE jvm_memory_max_bytes gauge jvm_memory_max_bytes{area="heap",id="Tenured Gen",} 1.44048128E8 jvm_memory_max_bytes{area="nonheap",id="CodeHeap 'profiled nmethods'",} 1.22912768E8 jvm_memory_max_bytes{area="heap",id="Eden Space",} 5.767168E7 jvm_memory_max_bytes{area="nonheap",id="Metaspace",} -1.0 jvm_memory_max_bytes{area="nonheap",id="CodeHeap 'non-nmethods'",} 5828608.0 jvm_memory_max_bytes{area="heap",id="Survivor Space",} 7143424.0 jvm_memory_max_bytes{area="nonheap",id="Compressed Class Space",} 1.073741824E9 jvm_memory_max_bytes{area="nonheap",id="CodeHeap 'non-profiled nmethods'",} 1.22916864E8 # HELP jvm_threads_daemon_threads The current number of live daemon threads # TYPE jvm_threads_daemon_threads gauge jvm_threads_daemon_threads 27.0 # HELP jvm_threads_live_threads The current number of live threads including both daemon and non-daemon threads # TYPE jvm_threads_live_threads gauge jvm_threads_live_threads 31.0 # HELP tomcat_sessions_expired_sessions_total # TYPE tomcat_sessions_expired_sessions_total counter tomcat_sessions_expired_sessions_total 0.0 # HELP jvm_gc_memory_allocated_bytes_total Incremented for an increase in the size of the young generation memory pool after one GC to before the next # TYPE jvm_gc_memory_allocated_bytes_total counter jvm_gc_memory_allocated_bytes_total 4.01069008E8 # HELP tomcat_sessions_alive_max_seconds # TYPE tomcat_sessions_alive_max_seconds gauge tomcat_sessions_alive_max_seconds 0.0 # HELP process_uptime_seconds The uptime of the Java virtual machine # TYPE process_uptime_seconds gauge process_uptime_seconds 20930.109 # HELP jvm_gc_memory_promoted_bytes_total Count of positive increases in the size of the old generation memory pool before GC to after GC # TYPE jvm_gc_memory_promoted_bytes_total counter jvm_gc_memory_promoted_bytes_total 1.0545424E7 # HELP jvm_buffer_total_capacity_bytes An estimate of the total capacity of the buffers in this pool # TYPE jvm_buffer_total_capacity_bytes gauge jvm_buffer_total_capacity_bytes{id="mapped",} 0.0 jvm_buffer_total_capacity_bytes{id="direct",} 81920.0 # HELP jvm_buffer_memory_used_bytes An estimate of the memory that the Java virtual machine is using for this buffer pool # TYPE jvm_buffer_memory_used_bytes gauge jvm_buffer_memory_used_bytes{id="mapped",} 0.0 jvm_buffer_memory_used_bytes{id="direct",} 81920.0
-
Python modules - In this case, the endpoint collects the output of all metric collectors provided by the Prometheus Client library (ProcessCollector, PlatformCollector, and GCCollector), including Ataccama’s placeholder
info
metric called microservice, which holds the name of the microservice.Response body example
Status code# HELP process_virtual_memory_bytes Virtual memory size in bytes. # TYPE process_virtual_memory_bytes gauge process_virtual_memory_bytes 1.479733248e+09 # HELP process_resident_memory_bytes Resident memory size in bytes. # TYPE process_resident_memory_bytes gauge process_resident_memory_bytes 1.76336896e+08 # HELP process_start_time_seconds Start time of the process since unix epoch in seconds. # TYPE process_start_time_seconds gauge process_start_time_seconds 1.60447759201e+09 # HELP process_cpu_seconds_total Total user and system CPU time spent in seconds. # TYPE process_cpu_seconds_total counter process_cpu_seconds_total 1.7400000000000002 # HELP process_open_fds Number of open file descriptors. # TYPE process_open_fds gauge process_open_fds 29.0 # HELP process_max_fds Maximum number of open file descriptors. # TYPE process_max_fds gauge process_max_fds 1024.0 # HELP python_info Python platform information # TYPE python_info gauge python_info{implementation="CPython",major="3",minor="8",patchlevel="5",version="3.8.5"} 1.0 # HELP python_gc_objects_collected_total Objects collected during gc # TYPE python_gc_objects_collected_total counter python_gc_objects_collected_total{generation="0"} 9612.0 python_gc_objects_collected_total{generation="1"} 1624.0 python_gc_objects_collected_total{generation="2"} 6.0 # HELP python_gc_objects_uncollectable_total Uncollectable object found during GC # TYPE python_gc_objects_uncollectable_total counter python_gc_objects_uncollectable_total{generation="0"} 0.0 python_gc_objects_uncollectable_total{generation="1"} 0.0 python_gc_objects_uncollectable_total{generation="2"} 0.0 # HELP python_gc_collections_total Number of times this generation was collected # TYPE python_gc_collections_total counter python_gc_collections_total{generation="0"} 471.0 python_gc_collections_total{generation="1"} 42.0 python_gc_collections_total{generation="2"} 3.0 # HELP microservice_info Details of the microservice # TYPE microservice_info gauge microservice_info{name="cli_microservice"} 1.0
MMM active users metric
The following properties configure how active user sessions in MMM are monitored.
Property | Data type | Description |
---|---|---|
|
Boolean |
Exposes the Prometheus metric ( Enabled by default. Default value: |
|
String |
A comma-separated list of URLs where the user metric is active.
For example, calling the Default value: |
|
Number |
Sets the maximum number of entries in the active users cache. Default value: |
|
String |
Configures the maximum period elapsed between subsequent calls made by an active user. If no calls are made during this period, the user is no longer counted as active. Default value: |
|
String |
A comma-separated list of users that are excluded from the active user count (for example, monitoring). By default, the list is empty. |
Additional settings
The following properties are listed for information purposes only and should not be modified. |
Property | Data type | Description |
---|---|---|
|
Boolean |
Enables all actuator endpoints.
If set to Default value: |
|
String |
A comma-separated list of exposed actuator endpoints that should provide information about the application. Default value: These endpoints track the following:
|
|
String |
A comma-separated list that determines how the Default value: |
|
String |
Configures how much information the To show all available information from the Default value: |
|
String |
Specifies how much information is provided by the health monitoring endpoint. The following values are available:
Default value: |
|
String |
Specifies how much detail the health monitoring endpoint provides about the application components. You can also define which components are shown. The following values are available:
Default value: |
|
Boolean |
Refers to the instrumentation of all Spring Boot endpoints.
If set to Default value: |
Application lifecycle examples
The following tables provide an overview of application statuses and expected HTTP response codes during each phase of the application lifecycle.
-
Startup
Name Liveness status Liveness HTTP response code Readiness status Readiness HTTP response code Starting
N/A
orBROKEN
N/A or
4xx
or5xx
REFUSING_TRAFFIC
503 Service Unavailable
Started
CORRECT
200 OK
REFUSING_TRAFFIC
503 Service Unavailable
Ready
CORRECT
200 OK
ACCEPTING_TRAFFIC
200 OK
-
Failure
Name Liveness status Liveness HTTP response code Readiness status Readiness HTTP response code Ready
CORRECT
200 OK
ACCEPTING_TRAFFIC
200 OK
Not ready
CORRECT
200 OK
REFUSING_TRAFFIC
503 Service Unavailable
Ready
CORRECT
200 OK
ACCEPTING_TRAFFIC
200 OK
-
Shutdown
Name Liveness status Liveness HTTP response code Readiness status Readiness HTTP response code Running
CORRECT
200 OK
ACCEPTING_TRAFFIC
200 OK
Graceful shutdown
CORRECT
200 OK
REFUSING_TRAFFIC
503 Service Unavailable
Shutdown complete
N/A
N/A
N/A
N/A
Was this page useful?