top of page
Search

Apache Kafka - Getting Started !

  • anithamca
  • Nov 12, 2021
  • 3 min read

Updated: Dec 9, 2021

This blog is written with intention for students to kick start on basics of Kafka in few mins.

Apache Kafka is a community distributed streaming platform capable of handling trillions of events a day.

Kafka Basic Terminologies

Most popular definition about kafka is 'Distributed commit log'. Microservice application exchange events using topics/streams in real time , data will be processed immediate hence can be used for real time analytics and recommendations.

Kafka provides multi-producer and multi-consumer support. In legacy systems once message is consumed from a topic by a consumer it is lost, whereas kafka will help retain message for multi-consumer.It breaks the myth of A2A application from application tight coupled to a loose-coupled architecture. Few terminologies ,

Producer - Application that sends message to Kafka

Consumer - Applcation that receives message from kafka

Broker - Kafka Server

Cluster - Group of Computers

Topic - Name of a kafka stream

Partition - Part of topic

Offset - Unique Id of a message within a partition

Consumer Group - Group of consumers acting as single unit.


Fig1: Kafka Terminologies


Fig2 : Kafka Cluster

Imagine a simple topic xyz created in kafka with 3 partition , messages produced will get distributed across all 3 partition and it ensures same message will not be read by another consumer from same consumer group. Messages gets persisted in a disk based storage for pre-configured amount of time so that ,even if there is a consumer crash or failure messages is available for continuous processing.



Fig3 : Simple Topic xyz

Originated in LinkedIn for activity tracking. Stream Processing helps in reading and transforming messages , which opens up opportunities for various use-cases for big data and event-driven applications. Twitter,Uber,Yahoo,Netflix and huge number of applications use kafka in real world. Broker,zookeeper,producers,consumers are major components of kafka.

Kafka High Level Architecture Diagram


Fig 4: E2E High Level Architecture of Kafka

Standalone or local kafka set up will include only zookeeper and kafka cluster with 1 or more broker. Cloud version of kafka is also provided by Confluent which includes - Schema Registry, Kafka Stream ,Kafka Connect, ksql DB which are out of the box plugins or libraries provided by Confluent Kafka.

A Kafka Record



Fig 5 : Kafka Record attributes

Setting up Kafka Local Installation

- Standard Kafka

Note : Homebrew package manager for Mac


Standard Kafka

1.Install Brew -

2.Download and Install JDK 11

3.Kafka Binary Download- https://kafka.apache.org/downloads

4.Install kafka CLI - $brew install kafka

5.Edit Zookeeper & Kafka configs using a text editor >> goto to conf folder

zookeeper.properties: dataDir=/your/path/to/data/zookeeper

server.properties: log.dirs=/your/path/to/data/kafka

6.Start Zookeeper

$cd /Users/<yourname>/Documents/kafka_2.12-2.1.1/bin

./zookeeper-server-start.sh ../config/zookeeper.properties

Start Kafka that communicates with zookeeper

$ ./kafka-server-start.sh ../config/server.properties

Apache Kafka CLI

Below are some of CLI commands to quick try kafka record produce and consumption.

$kafka-topics --list --zookeeper localhost:2181

$kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic ani-topic

$kafka-topics --describe --zookeeper localhost:2181 --topic ani-topic

$kafka-console-consumer --bootstrap-server localhost:9092 --topic ani-topic

$kafka-console-producer --broker-list localhost:9092 --topic ani-topic

<<produce some message and validate consumer in other terminal is able to consume message>>

$kafka-topics --delete --zookeeper localhost:2181 --topic ani-topic

Note:For delete add the following in kafka server.properties

delete.topic.enable=true

Java/Spring Boot Client for Kafka Producer /Consumer

Configuration APIs

Major APIs - Admin API,Stream API,Connect API,Producer API,Consumer API

Confluent Kafka

Kafka UI Tools

  • Conduktor - https://www.conduktor.io/download/ (Licensed , Developers can use it for local single cluster set up)

  • Apache Kafka drop - https://towardsdatascience.com/kafdrop-e869e5490d62

  • Opensource kafka tool - https://www.kafkatool.com/

  • Confluent - confluent-control center (Cloud)

Offset Management in Kafka

Current Offset - Sent records

Commited Offset - Offset that is already processed by a consumer.

Properties : enable.auto.commit by default true - auto.commit.interval.ms-5 sec

Manual Approach - CommitSync(),CommitAsync()

Fault Tolerance

Achieved using Replication Factor, If a topic is created with replication factor say 2 messages will get duplicated across two brokers so that if 1 goes down other will hold message for further processing for high availability.

 
 
 

Commentaires


Post: Blog2_Post

Subscribe Form

Thanks for submitting!

©2021 by anitharajamuthutechno. Proudly created with Wix.com

bottom of page