Sizing Guidelines
This article provides basic information and strategies about setting up and configuring client-side components in order to achieve optimal performance and functionality. The recommendations described in the following sections are based on tests performed in a test environment simulating customer environment conditions.
For more information about the testing process, including the data set used and the test server configuration, see section Test server configuration.
How DPE server sizing is calculated?
The sizing of the DPE server is mainly driven by the following parameters:
-
The total size of data for processing (in MB).
-
The size of the largest table in the data set (in MB).
-
How many data processing jobs are expected to run in parallel on the DPE server.
Therefore, the necessary resources need to be adjusted as follows:
-
CPU power: How much CPU power is needed depends on the total size of the customer data, the total number of tables in the data source, as well as the number of parallel processes. The higher any of these parameters, the more CPU power is required.
-
Disk space: Depends on the size of the largest table and the number of parallel processes. The higher any of these parameters, the more disk space is required.
-
Network throughput: The more CPU is used, the higher network throughput is required.
Calculate the number of CPUs
The minimum recommended number of CPUs or vCPUs for the DPE server is two. While deploying the DPE server with a lower number of CPUs than recommended is still expected to provide reliable services, the total processing time can be significantly affected.
As mentioned previously, the number of CPUs recommended for the DPE server is calculated with regards to the following:
-
The total size of the data source.
-
The number of tables in the data source.
-
The number of parallel processes expected to be running on the DPE server.
To keep the processing time limited to approximately 20 hours, as measured on the test data source described in Test server configuration, the number of CPUs can be estimated using the following formula:
By default, the number of parallel processes is set to five. |
This calculation is valid for both physical and virtual x86 servers. In a public cloud environment we recommend selecting the instance whose CPU and RAM characteristics match the given values as closely as possible.
If the calculation results indicate that a higher number of CPUs is needed for a single DPE server, which can lead to greater costs, we recommend deploying more DPE servers to distribute the load. The total number of CPUs for all DPE servers configured for a particular data source should be in line with the calculated recommendation.
Calculate the memory (RAM) sizing
The minimum recommended memory size for the DPE server is 2 GB.
This depends on the following:
-
The size of the largest table in the data source.
-
The number of parallel processes that are expected to run on the DPE server.
By default, the number of parallel processes is set to five. |
To make sure that the DPE server is stable and efficient and that unnecessary disk caching is prevented, the RAM requirements for the DPE server should be calculated using the following formula:
This calculation is valid for both physical and virtual x86 servers. In a public cloud environment we recommended selecting the instance whose CPU and RAM characteristics match the given values as closely as possible.
Calculate the storage sizing
The minimum recommended disk size for the DPE server is determined based on the following parameters:
-
The size of the largest table in the data source.
-
The number of parallel processes that are expected to run on the DPE server.
In cases when some DPE processes do not have sufficient disk space, the affected operation fails and must be manually restarted. |
The required disk size can be calculated using the following formula:
In addition, we recommend using storage with the following characteristics:
-
Maximum storage latency: < 3 ms.
-
Minimum I/O rate: 5000 IOPS.
To save storage space, you can enable compression of the processed records kept in the file storage.
This is done by adding the following property to DPE configuration: plugin.executor-launch-model.ataccama.one.launch-type-properties.LOCAL.env.JAVA_OPTS=-Ddqd.io.storage.compress=compress .
For more information, see DPE Configuration, section Executor configuration.
|
Check the connectivity parameters
The way in which the connection path between the DPE server and the data sources is set up impacts the service performance and reliability. For the DPE server, the minimum required connection throughput is 1 Gbps.
If more network interfaces are needed, we recommend using interface binding on the OS level or a similar grouping mechanism. To find out the recommended number of 1 Gbps network interfaces for your particular use case, use the following formula:
The maximum recommended network round-trip time (RTT) between DPE and data sources is under 6 ms.
Test server configuration
Sizing calculations were designed based on testing conducted in an Amazon Web Services (AWS) environment with the following instance types:
-
m5.large, 2 CPUs, 4 GB RAM
-
m5.xlarge, 4 CPUs, 8 GB RAM
-
m5.2xlarge, 8 CPUs, 16 GB RAM
Testing data
A relational database was used to simulate the structure and size of actual customer data. The total data source size was 54.1 GB.
The following table provides more information about the individual assets in the data source.
Test data source structure
Table | Number of Columns | Number of Records | Table Size |
---|---|---|---|
dpx_test_data_5000x80 |
80 |
5 000 |
5032 kB |
dpx_test_data_5000x40 |
40 |
5 000 |
2560 kB |
dpx_test_data_5000x20 |
20 |
5 000 |
1360 kB |
dpx_test_data_5000x160 |
160 |
5 000 |
10088 kB |
dpx_test_data_5000x10 |
10 |
5 000 |
752 kB |
dpx_test_data_50000x80 |
80 |
50 000 |
49 MB |
dpx_test_data_50000x40 |
40 |
50 000 |
25 MB |
dpx_test_data_50000x20 |
20 |
50 000 |
13 MB |
dpx_test_data_50000x160 |
160 |
50 000 |
98 MB |
dpx_test_data_50000x10 |
10 |
50 000 |
7232 kB |
dpx_test_data_500000x80 |
80 |
500 000 |
488 MB |
dpx_test_data_500000x40 |
40 |
500 000 |
246 MB |
dpx_test_data_500000x20 |
20 |
500 000 |
129 MB |
dpx_test_data_500000x160 |
160 |
500 000 |
979 MB |
dpx_test_data_500000x10 |
10 |
500 000 |
70 MB |
dpx_test_data_5000000x80 |
80 |
5 000 000 |
4884 MB |
dpx_test_data_5000000x40 |
40 |
5 000 000 |
2460 MB |
dpx_test_data_5000000x20 |
20 |
5 000 000 |
1294 MB |
dpx_test_data_5000000x160 |
160 |
5 000 000 |
9794 MB |
dpx_test_data_5000000x10 |
10 |
5 000 000 |
703 MB |
dpx_test_data_50000000x20 |
20 |
5 000 000 |
2588 MB |
dpx_test_data_250000x80 |
80 |
250 000 |
244 MB |
dpx_test_data_250000x40 |
40 |
250 000 |
123 MB |
dpx_test_data_250000x20 |
20 |
250 000 |
65 MB |
dpx_test_data_250000x160 |
160 |
250 000 |
490 MB |
dpx_test_data_250000x10 |
10 |
250 000 |
35 MB |
dpx_test_data_2500000x80 |
80 |
2 500 000 |
2442 MB |
dpx_test_data_2500000x40 |
40 |
2 500 000 |
1230 MB |
dpx_test_data_2500000x20 |
20 |
2 500 000 |
647 MB |
dpx_test_data_2500000x160 |
160 |
2 500 000 |
4897 MB |
dpx_test_data_2500000x10 |
10 |
2 500 000 |
351 MB |
dpx_test_data_1000000x80 |
80 |
1 000 000 |
977 MB |
dpx_test_data_1000000x40 |
40 |
1 000 000 |
492 MB |
dpx_test_data_1000000x20 |
20 |
1 000 000 |
259 MB |
dpx_test_data_1000000x160 |
160 |
1 000 000 |
1959 MB |
dpx_test_data_1000000x10 |
10 |
1 000 000 |
141 MB |
dpx_test_data_10000000x80 |
80 |
10 000 000 |
9768 MB |
dpx_test_data_10000000x40 |
40 |
10 000 000 |
4920 MB |
dpx_test_data_10000000x160 |
160 |
10 000 000 |
19 GB |
dpx_test_data_10000000x10 |
10 |
10 000 000 |
1406 MB |
Was this page useful?