Setting up your own Apache Kafka cluster
Disclaimer!
This post is not something new about Apache Kafka, it is meant to be a guide, if someone would like to install Apache Kafka on their own home lab cluster. I used 3 Ubuntu 24.04 Server nodes with the following configurations -
IBM Lenovo ThinkCenter M700
i5–6500T CPU
8GB RAM
128GB SSD
The nodes costs me $60 each, so that’s about ~$225 if you add the ethernet cables and switch. In addition to Kafka, I also utilize these nodes to run a MongoDB Replication Set and a Kubernetes nodes cluster.
If you are an enterprise and would like to avoid the labor, risks and guesswork associated with managing your own Kafka cluster, your answer is Confluent Cloud.
Installation
With the disclaimer out of the way, lets get started with the installation and configuration. The commands will be similar for all of linux flavors.
Download the package from Apache Kafka site, and untar the package and move it into a base directory, I’m using /opt/kafka.
tar -zxvf kafka_2.13–3.8.0.tgz
mv kafka_2.13–3.8.0 /opt/kafka/
Install openjdk, in my case I’m using version 17-
sudo apt install openjdk-17-jre-headless
Configuration
Kafka now has moved on from zookeeper and its suggested to utilize kraft for managing the cluster, so the following steps are based on kraft.
The commands henceforth are assumed to be from the working directory of kafka unarchived folder. Create the kafka cluster ID on the first node and utilize that same ID as environment variable on all the other nodes.
KAFKA_CLUSTER_ID="$(bin/kafka-storage.sh random-uuid)"
bin/kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c config/kraft/server.properties
Configure the server properties (KAFKA_HOME/config/kraft/server.properties) with the cluster configuration details, the major ones are -
controller.quorum.voters=1@192.168.1.199:9093,2@192.168.1.200:9093,3@192.168.1.181:9093
node.id=1#this will change for each nodes
#this is the ipaddress of the machine on host of server.properties
#Every node will have its own unique address
listeners=PLAINTEXT://192.168.1.199:9092,CONTROLLER://192.168.1.199:9093
advertised.listeners=PLAINTEXT://192.168.1.199:9092
log.dirs=/opt/kafka/logDir
num.partitions=6
offsets.topic.replication.factor=2
Create the service definition — /etc/systemd/system/kafka.service. Substitute the /opt/kafka/kafka_2.13–3.8.0/ with your KAFKA_HOME variable’s value or where the installation is.
[Unit]
Description=Kafka Service
[Service]
Type=forking
User=kafka
Environment=KAFKA_HEAP_OPTS="-Xmx1G -Xms1G"
Environment=KAFKA_JVM_PERFORMANCE_OPTS="-XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent"
ExecStart=/opt/kafka/kafka_2.13-3.8.0/bin/kafka-server-start.sh -daemon /opt/kafka/kafka_2.13-3.8.0/config/kraft/server.properties --override controller.mode=kraft
ExecStop=/opt/kafka/kafka_2.13-3.8.0/bin/kafka-server-stop.sh /opt/kafka/kafka_2.13-3.8.0/config/kraft/server.properties --override controller.mode=kraft
Restart=on-failure
LimitNOFILE=infinity
[Install]
WantedBy=default.target
Initiate the following sequence of commands on the all the nodes to start the cluster -
sudo systemctl daemon-reload
sudo systemctl start kafka.service
sudo systemctl status kafka.service
The result of the final command on all the nodes should be similar to this to confirm that the cluster is up and running —
Verification
The first step would be to check the cluster quorum status -
./bin/kafka-metadata-quorum.sh --bootstrap-controller 192.168.1.199:9093 describe --status
The commands to create, describe, write and read from the topics(substitute the IP address with the one of your cluster),
./bin/kafka-topics.sh --create --topic first-topic --bootstrap-server 192.168.1.199:9092 --replication-factor 2
./bin/kafka-topics.sh --describe --bootstrap-server 192.168.1.199:9092 --topic first-topic
./bin/kafka-console-producer.sh --topic first-topic --bootstrap-server 192.168.1.199:9092
./bin/kafka-console-consumer.sh --topic first-topic --from-beginning --bootstrap-server 192.168.1.199:9092
If all is working well, the results should be something similar -
Conducktor
If you’re managing a Kafka Cluster it can be a challenge to know all the commands and you may want an UI application that can allow you to manage it. I use Conduktor’s free desktop version that provides for basic information. A paid version with more features is available —
Network connectivity
If you’ve noticed, I’ve used IP addresses for most of the commands and configurations. This can be avoided by setting up the host records in the /etc/hosts file of the respective nodes by something like this —
The codebase with regards to config files are available via my github repository.