User Community Service Desk Downloads
If you can't find the product or version you're looking for, visit support.ataccama.com/downloads

Sizing Guidelines

This article provides basic information and strategies about setting up and configuring client-side components in order to achieve optimal performance and functionality. The recommendations described in the following sections are based on tests performed in a test environment simulating customer environment conditions.

For more information about the testing process, including the data set used and the test server configuration, see section Test server configuration.

How DPE server sizing is calculated?

The sizing of the DPE server is mainly driven by the following parameters:

  • The total size of data for processing (in MB).

  • The size of the largest table in the data set (in MB).

  • How many data processing jobs are expected to run in parallel on the DPE server.

Therefore, the necessary resources need to be adjusted as follows:

  • CPU power: How much CPU power is needed depends on the total size of the customer data, the total number of tables in the data source, as well as the number of parallel processes. The higher any of these parameters, the more CPU power is required.

  • Disk space: Depends on the size of the largest table and the number of parallel processes. The higher any of these parameters, the more disk space is required.

  • Network throughput: The more CPU is used, the higher network throughput is required.

Calculate the number of CPUs

The minimum recommended number of CPUs or vCPUs for the DPE server is two. While deploying the DPE server with a lower number of CPUs than recommended is still expected to provide reliable services, the total processing time can be significantly affected.

As mentioned previously, the number of CPUs recommended for the DPE server is calculated with regards to the following:

  • The total size of the data source.

  • The number of tables in the data source.

  • The number of parallel processes expected to be running on the DPE server.

To keep the processing time limited to approximately 20 hours, as measured on the test data source described in Test server configuration, the number of CPUs can be estimated using the following formula:

Calculate the number of CPUs
By default, the number of parallel processes is set to five.

This calculation is valid for both physical and virtual x86 servers. In a public cloud environment we recommend selecting the instance whose CPU and RAM characteristics match the given values as closely as possible.

If the calculation results indicate that a higher number of CPUs is needed for a single DPE server, which can lead to greater costs, we recommend deploying more DPE servers to distribute the load. The total number of CPUs for all DPE servers configured for a particular data source should be in line with the calculated recommendation.

Calculate the memory (RAM) sizing

The minimum recommended memory size for the DPE server is 2 GB.

This depends on the following:

  • The size of the largest table in the data source.

  • The number of parallel processes that are expected to run on the DPE server.

By default, the number of parallel processes is set to five.

To make sure that the DPE server is stable and efficient and that unnecessary disk caching is prevented, the RAM requirements for the DPE server should be calculated using the following formula:

Calculate the RAM sizing

This calculation is valid for both physical and virtual x86 servers. In a public cloud environment we recommended selecting the instance whose CPU and RAM characteristics match the given values as closely as possible.

Calculate the storage sizing

The minimum recommended disk size for the DPE server is determined based on the following parameters:

  • The size of the largest table in the data source.

  • The number of parallel processes that are expected to run on the DPE server.

In cases when some DPE processes do not have sufficient disk space, the affected operation fails and must be manually restarted.

The required disk size can be calculated using the following formula:

Calculate the storage sizing

In addition, we recommend using storage with the following characteristics:

  • Maximum storage latency: < 3 ms.

  • Minimum I/O rate: 5000 IOPS.

To save storage space, you can enable compression of the processed records kept in the file storage. This is done by adding the following property to DPE configuration: plugin.executor-launch-model.ataccama.one.launch-type-properties.LOCAL.env.JAVA_OPTS=-Ddqd.io.storage.compress=compress. For more information, see DPE Configuration, section Executor configuration.

Check the connectivity parameters

The way in which the connection path between the DPE server and the data sources is set up impacts the service performance and reliability. For the DPE server, the minimum required connection throughput is 1 Gbps.

If more network interfaces are needed, we recommend using interface binding on the OS level or a similar grouping mechanism. To find out the recommended number of 1 Gbps network interfaces for your particular use case, use the following formula:

Connectivity parameters

The maximum recommended network round-trip time (RTT) between DPE and data sources is under 6 ms.

Test server configuration

Sizing calculations were designed based on testing conducted in an Amazon Web Services (AWS) environment with the following instance types:

  • m5.large, 2 CPUs, 4 GB RAM

  • m5.xlarge, 4 CPUs, 8 GB RAM

  • m5.2xlarge, 8 CPUs, 16 GB RAM

Testing data

A relational database was used to simulate the structure and size of actual customer data. The total data source size was 54.1 GB.

The following table provides more information about the individual assets in the data source.

Test data source structure
Table Number of Columns Number of Records Table Size

dpx_test_data_5000x80

80

5 000

5032 kB

dpx_test_data_5000x40

40

5 000

2560 kB

dpx_test_data_5000x20

20

5 000

1360 kB

dpx_test_data_5000x160

160

5 000

10088 kB

dpx_test_data_5000x10

10

5 000

752 kB

dpx_test_data_50000x80

80

50 000

49 MB

dpx_test_data_50000x40

40

50 000

25 MB

dpx_test_data_50000x20

20

50 000

13 MB

dpx_test_data_50000x160

160

50 000

98 MB

dpx_test_data_50000x10

10

50 000

7232 kB

dpx_test_data_500000x80

80

500 000

488 MB

dpx_test_data_500000x40

40

500 000

246 MB

dpx_test_data_500000x20

20

500 000

129 MB

dpx_test_data_500000x160

160

500 000

979 MB

dpx_test_data_500000x10

10

500 000

70 MB

dpx_test_data_5000000x80

80

5 000 000

4884 MB

dpx_test_data_5000000x40

40

5 000 000

2460 MB

dpx_test_data_5000000x20

20

5 000 000

1294 MB

dpx_test_data_5000000x160

160

5 000 000

9794 MB

dpx_test_data_5000000x10

10

5 000 000

703 MB

dpx_test_data_50000000x20

20

5 000 000

2588 MB

dpx_test_data_250000x80

80

250 000

244 MB

dpx_test_data_250000x40

40

250 000

123 MB

dpx_test_data_250000x20

20

250 000

65 MB

dpx_test_data_250000x160

160

250 000

490 MB

dpx_test_data_250000x10

10

250 000

35 MB

dpx_test_data_2500000x80

80

2 500 000

2442 MB

dpx_test_data_2500000x40

40

2 500 000

1230 MB

dpx_test_data_2500000x20

20

2 500 000

647 MB

dpx_test_data_2500000x160

160

2 500 000

4897 MB

dpx_test_data_2500000x10

10

2 500 000

351 MB

dpx_test_data_1000000x80

80

1 000 000

977 MB

dpx_test_data_1000000x40

40

1 000 000

492 MB

dpx_test_data_1000000x20

20

1 000 000

259 MB

dpx_test_data_1000000x160

160

1 000 000

1959 MB

dpx_test_data_1000000x10

10

1 000 000

141 MB

dpx_test_data_10000000x80

80

10 000 000

9768 MB

dpx_test_data_10000000x40

40

10 000 000

4920 MB

dpx_test_data_10000000x160

160

10 000 000

19 GB

dpx_test_data_10000000x10

10

10 000 000

1406 MB

Was this page useful?