TubeMQ VS Kafka
2 Test scenario scheme
The following is the test plan we designed according to the actual application scenario:
To describe the characters in “The Avengers”:
For specific data analysis:
- Under the single-topic and single-instance configuration, the throughput of TubeMQ is much lower than that of Kafka; under the single-topic multi-instance configuration, the throughput of TubeMQ catches up with Kafka in the configuration of 5 partitions when there are 4 instances, and the throughput of TubeMQ varies with the number of instances. Increases and increases, Kafka does not rise but falls; TubeMQ can dynamically control the throughput improvement by adjusting various parameters during system operation;
- Under the multi-topic and multi-instance configuration, the throughput of TubeMQ is maintained in a very stable range, and the resource consumption, including the number of file handles and network connection handles, is very low; the throughput of Kafka shows a significant downward trend with the increase of the number of topics. And resource consumption increases sharply; under the condition of SATA disk storage, as the configuration of the model improves, the throughput of TubeMQ can be directly pressed to the disk bottleneck, while Kafka is in an unstable state; in the case of CG1 model SSD disk, Kafka’s The throughput is better than TubeMQ;
- When filtering consumption, TubeMQ can greatly reduce the network outbound traffic of the server, and at the same time, the resources consumed by filtering consumption are less than the full consumption, which in turn promotes the throughput of TubeMQ; kafka has no server-side filtering, outflow and full volume Consistent consumption, no significant savings in traffic;
- There are differences in resource consumption: TubeMQ uses sequential writing and random reading, which consumes a lot of CPU. Kafka uses sequential writing and block reading, which consumes very little CPU, but other resources, such as file handles and network connections, consume a lot of money. In the actual operating environment in SAAS mode, Kafka will have system bottlenecks due to zookeeper dependence, and there will be more restrictions due to production, consumption, and brokers, such as file handles, network connections, etc., and resource consumption will be higher. Big;
4 Test environment and configuration
4.2 [Broker hardware model configuration]
4.3 [Broker system configuration]
5.1 Scenario 1: Basic scenario, single topic situation, one input and two output models, using different consumption modes, message packets of different sizes, and partitions to scale horizontally, comparing the performance of TubeMQ and Kafka
5.1.1 【Conclusion】
In the case of a single topic with different partitions:
- The throughput of Kafka decreases slightly with the increase of partitions, and the CPU usage is very low;
- Since TubeMQ partitions are logical partitions, increasing the partitions will not affect the throughput; Kafka partitions are the increase of physical files, but increasing the inbound and outbound traffic of the partitions will actually decrease;
" class="reference-link">5.1.2 【Indicators】
5.2.1 【Conclusion】
From the combination of test data of scenario 1 and scenario 2:
- As the number of instances increases, the throughput of TubeMQ increases. When there are 4 instances, the throughput is the same as that of Kafka, the usage of disk IO is lower than that of Kafka, and the usage of CPU is higher than that of Kafka;
- The consumption mode of TubeMQ affects the throughput of the system. The performance of the memory read mode (301) is lower than that of the file read mode (101), but it can reduce the delay of messages;
- Kafka does not improve the system throughput as scheduled as the number of partition instances increases;
- After TubeMQ increases the instances (physical files) equivalent to Kafka, the throughput increases accordingly, and the test effect reaches and exceeds that of Kafka when there are 4 instances. The status of 5 partitions; TubeMQ can adjust the data reading method according to the needs of business or system configuration, which can dynamically improve the throughput of the system; as the number of partitions increases, the incoming traffic of Kafka decreases;
5.2.2 【Indicators】
Note 1: In the following scenarios, they are all tests in different partitions or instances, and in different read mode scenarios under a single topic test, and the length of a single message packet is 1K;
Note 2:In read mode, set qryPriorityId to the corresponding value through admin_upd_def_flow_control_rule.
5.3 Scenario 3: Multi-topic scenario, fixed message packet size, number of instances and partitions, and investigate the performance of TubeMQ and Kafka in 100, 200, 500, and 1000 topic scenarios
5.3.1 【Conclusion】
Test in multi-topic scenarios:
- As the number of topics in TubeMQ increases, the production and consumption performance maintains an average line, there is no particularly large traffic fluctuation, and the number of file handles, memory, and network connections occupied is not large (1k There are about 7500 file handles under topic and 150 network connections), but the CPU usage is relatively large;
- After TubeMQ has changed the consumption mode from memory consumption to file consumption, the throughput has increased greatly, the CPU usage has decreased, and services with different performance requirements can be differentiated;
- As the number of topics increases, the throughput of Kafka decreases significantly. At the same time, the Kafka traffic fluctuates violently, the storage and consumption lag in long-term operation, and the throughput decreases obviously, and the number of memory, file handles, and network connections is very large. (at 1K When Topic is configured, the network connection reaches 1.2W, and the file handle reaches 4.5W) and other problems;
- In terms of data comparison, TubeMQ runs more stably than Kafka, the throughput is presented in a stable situation, the throughput does not drop for a long time, and the resource occupation is small, but the CPU occupation needs to be solved in subsequent versions;
5.3.2 【Indicators】
Note: In the following scenarios, the package length is 1K and the number of partitions is 10.
5.4 Scenario 4: 100 topics, one input and one full output Five partial filtering: one full topic’s Pull consumption; filtering consumption using 5 different consumption groups, filtering out 10 from the same 20 topics 10% Message content
5.4.1 【Conclusion】
- TubeMQ adopts the mode of server-side filtering, and there is a significant difference between outgoing traffic indicators and incoming traffic;
- TubeMQ server-side filtering provides more resources for production, and the production performance is improved compared to the non-filtering situation;
- Kafka adopts the client-side filtering mode, the incoming traffic is not improved, the outgoing traffic is almost twice the incoming traffic, and the incoming and outgoing traffic is unstable;
5.4.2 【Indicators】
Note: In the following scenario, the topic is 100, the packet length is 1K, and the number of partitions is 10
5.5 Scenario 5: Comparison of Data Consumption Latency between TubeMQ and Kafka
Remarks: There is a situation in the consumer side of TubeMQ that the data is not found when waiting for the queue processing message to equalize the production. The default waiting delay is 200ms. When testing this item, the TubeMQ consumer should adjust the pull delay (ConsumerConfig.setMsgNotFoundWaitPeriodMs()) to 10ms, or set the frequency control policy to 10ms.
5.6.1 【Conclusion】
- TubeMQ’s adjustment of the topic’s memory cache size can have a positive impact on throughput, and it can be adjusted reasonably according to the machine conditions in actual use;
- From the actual usage, the memory size setting is not as large as possible, and the value needs to be set reasonably;
5.6.2 【Indicators】
Note: In the following scenarios, the consumption method is to read the PULL consumption of memory (301), and the length of a single message packet is 1K
5.7.2 【Indicators】
5.8 Scenario 8: Evaluate the performance of the two systems in the case of multiple models
5.8.1 [Conclusion]
- TubeMQ has a higher throughput under the BX1 model than the TS60 model. At the same time, because the IO util reaches the bottleneck, it cannot be improved any more. The throughput of the CG1 model has a higher index value than that of the BX1 model;
- The system throughput of Kafka under the BX1 model is unstable, and it is lower than that tested under the TS60 model. Under the CG1 model, the system throughput reaches the highest level, and the 10G network card is full;
- Under the condition of SATA disk storage, the performance indicators of TubeMQ have been significantly improved with the improvement of hardware configuration; the performance indicators of Kafka have not increased but decreased with the improvement of hardware models;
- Under the condition of SSD disk storage, Kafka has the best performance indicators, and TubeMQ indicators are not as good as Kafka;
- The data storage disk of the CG1 model is small (only 2.2T), and the disk will be full within 90 minutes under the RAID 10 configuration, so it is impossible to test the long-term operation of the two systems.
5.8.2 【Indicators】
Note 1: In the following scenarios, the topic number is configured with 500 topics, 10 partitions, and the message packet size is 1K bytes;
Note 2: TubeMQ uses 301 memory read mode consumption;
6 Appendix
6.1 Appendix 1 Resource occupancy chart for different models:
6.1.1 [BX1 model test]
6.1.2 [CG1 model test]
6.2 Appendix 2 Resource occupancy chart for multi-topic testing:
6.2.1 [100 topics]
6.2.2 [200 topics]
6.2.3 [500 topics]
6.2.4 [1000 topics]