Apache Kafka - Getting Started !
- anithamca
- Nov 12, 2021
- 3 min read
Updated: Dec 9, 2021
This blog is written with intention for students to kick start on basics of Kafka in few mins.
Apache Kafka is a community distributed streaming platform capable of handling trillions of events a day.
Kafka Basic Terminologies
Most popular definition about kafka is 'Distributed commit log'. Microservice application exchange events using topics/streams in real time , data will be processed immediate hence can be used for real time analytics and recommendations.
Kafka provides multi-producer and multi-consumer support. In legacy systems once message is consumed from a topic by a consumer it is lost, whereas kafka will help retain message for multi-consumer.It breaks the myth of A2A application from application tight coupled to a loose-coupled architecture. Few terminologies ,
Producer - Application that sends message to Kafka
Consumer - Applcation that receives message from kafka
Broker - Kafka Server
Cluster - Group of Computers
Topic - Name of a kafka stream
Partition - Part of topic
Offset - Unique Id of a message within a partition
Consumer Group - Group of consumers acting as single unit.

Fig1: Kafka Terminologies

Fig2 : Kafka Cluster
Imagine a simple topic xyz created in kafka with 3 partition , messages produced will get distributed across all 3 partition and it ensures same message will not be read by another consumer from same consumer group. Messages gets persisted in a disk based storage for pre-configured amount of time so that ,even if there is a consumer crash or failure messages is available for continuous processing.

Fig3 : Simple Topic xyz
Originated in LinkedIn for activity tracking. Stream Processing helps in reading and transforming messages , which opens up opportunities for various use-cases for big data and event-driven applications. Twitter,Uber,Yahoo,Netflix and huge number of applications use kafka in real world. Broker,zookeeper,producers,consumers are major components of kafka.
Kafka High Level Architecture Diagram

Fig 4: E2E High Level Architecture of Kafka
Standalone or local kafka set up will include only zookeeper and kafka cluster with 1 or more broker. Cloud version of kafka is also provided by Confluent which includes - Schema Registry, Kafka Stream ,Kafka Connect, ksql DB which are out of the box plugins or libraries provided by Confluent Kafka.
A Kafka Record

Fig 5 : Kafka Record attributes
Reference Doc https://kafka.apache.org/23/javadoc/org/apache/kafka/clients/producer/ProducerRecord.html
Setting up Kafka Local Installation
- Standard Kafka
Note : Homebrew package manager for Mac

Standard Kafka
1.Install Brew -
$/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
2.Download and Install JDK 11
3.Kafka Binary Download- https://kafka.apache.org/downloads
4.Install kafka CLI - $brew install kafka
5.Edit Zookeeper & Kafka configs using a text editor >> goto to conf folder
zookeeper.properties: dataDir=/your/path/to/data/zookeeper
server.properties: log.dirs=/your/path/to/data/kafka
6.Start Zookeeper
$cd /Users/<yourname>/Documents/kafka_2.12-2.1.1/bin
./zookeeper-server-start.sh ../config/zookeeper.properties
Start Kafka that communicates with zookeeper
$ ./kafka-server-start.sh ../config/server.properties
Apache Kafka CLI
Below are some of CLI commands to quick try kafka record produce and consumption.
$kafka-topics --list --zookeeper localhost:2181
$kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic ani-topic
$kafka-topics --describe --zookeeper localhost:2181 --topic ani-topic
$kafka-console-consumer --bootstrap-server localhost:9092 --topic ani-topic
$kafka-console-producer --broker-list localhost:9092 --topic ani-topic
<<produce some message and validate consumer in other terminal is able to consume message>>
$kafka-topics --delete --zookeeper localhost:2181 --topic ani-topic
Note:For delete add the following in kafka server.properties
delete.topic.enable=true
Java/Spring Boot Client for Kafka Producer /Consumer
Configuration APIs
Major APIs - Admin API,Stream API,Connect API,Producer API,Consumer API
Confluent Kafka
Kafka UI Tools
Conduktor - https://www.conduktor.io/download/ (Licensed , Developers can use it for local single cluster set up)
Apache Kafka drop - https://towardsdatascience.com/kafdrop-e869e5490d62
Opensource kafka tool - https://www.kafkatool.com/
Confluent - confluent-control center (Cloud)
Offset Management in Kafka
Current Offset - Sent records
Commited Offset - Offset that is already processed by a consumer.
Properties : enable.auto.commit by default true - auto.commit.interval.ms-5 sec
Manual Approach - CommitSync(),CommitAsync()
Fault Tolerance
Achieved using Replication Factor, If a topic is created with replication factor say 2 messages will get duplicated across two brokers so that if 1 goes down other will hold message for further processing for high availability.
Commentaires