Writing Kafka Consumers using Java

Consumers consume the messages published by Kafka producers.

 

Consumer API Basics

  • The consumer subscribes to the message consumption from a specific topic on the Kafka broker. 

  • While subscribing, the consumer connects to any of the live nodes, requests metadata about the leaders for the partitions of a topic, and This communicate directly with the lead broker receiving the messages, by specifying the message offset. 

  • The Kafka consumer works in the pull model and pulls all available messages after its current position in the Kafka log.

  • Each Kafka partition is consumed by one consumer only.

  • Once a partition is consumed, the consumer changes the message offset to the next partition to be consumed.

  • Like producers, consumers can also be different in nature, such as

    • applications doing real-time or near real-time analysis, 

    • applications with NoSQL or data warehousing solutions,

    • backend services,

    • consumers for Hadoop, or

    • other subscriber-based solutions.

  • These consumers can also be implemented in different languages such as Java, C, and Python.

 

Kafka consumer APIs

Kafka provides two types of API for Java consumers:

  • High-level API

  • Low-level API

 

High-level consumer API

The high-level consumer API hides broker details from the consumer and provide an abstraction over the low-level implementation. 

The high-level consumer API is used when only data is needed and the handling of message offsets is not required. 

The high-level consumer stores the last offset, and read from a specific partition in Zookeeper.

This offset is stored based on the consumer group name provided to Kafka at the beginning of the process.

The consumer group name is unique across the Kafka cluster and any new consumers with an in-use consumer group name may cause ambiguous behavior in the system. So we should avoid this if possible or shutdown any existing consumers before starting new consumers for an existing consumer group name.

 

Important high level API classes for working with Java are:

  • kafka.javaapi.consumer.ZookeeperConsumerConnector implements the the ConsumerConnector interface provided by Kafka.

  • kafka.consumer.KafkaStream class objects are returned by the createMessageStreams call from the ConsumerConnector implementation.

  • kafka.consumer.ConsumerConfig class encapsulates the property values required for establishing the connection with ZooKeeper, such as ZooKeeper URL, ZooKeeper session timeout, and ZooKeeper sink time.

 

The low-level consumer API

Also known as “simple consumer API”.

The low-level consumer API is stateless.

The low-level consumer API allows consumers to set the message offset with every request raised to the broker and maintains the metadata at the consumer’s end.

Can be used by both online as well as offline consumers such as Hadoop.

Consumers can also perform multiple reads for the same message or manage transactions to ensure the message is consumed only once.

Consumers first query the live broker to find out the details about the lead broker. Information about the live broker can be passed on to the consumers either using a properties file or from the command line. 

 

Important low level API classes are:

  • kafka.javaapi.consumer.SimpleConsumer class - provides a connection to the lead broker for fetching the messages from the topic and methods to get the topic metadata and the list of offsets.

  • kafka.javaapi.TopicMetadataResponse class- The topicsMetadata() method of the kafka.javaapi.TopicMetadataResponse class is used to find out metadata about the topic of interest from the lead broker.

  • kafka.api.OffsetRequest class - For message partition reading, the kafka.api.OffsetRequest class defines two constants: EarliestTime and LatestTime, to find the beginning of the data in the logs and the new messages stream. These constants also help consumers to track which messages are already read.

Below are few more important low level API classes for building different request objects:

  • kafka.javaapi.OffsetRequest

  • kafka.javaapi.OffsetFetchRequest

  • kafka.javaapi.OffsetCommitRequest

  • kafka.javaapi.TopicMetadataRequest

Learn Serverless from Serverless Programming Cookbook

Contact

Please first use the contact form or facebook page messaging to connect.

Offline Contact
We currently connect locally for discussions and sessions at Bangalore, India. Please follow us on our facebook page for details.
WhatsApp (Primary): (+91) 7411174113
Phone (Escalations): (+91) 7411174114

Business newsletter

Complete the form below, and we'll send you an e-mail every now and again with all the latest news.

About

CloudMaterials is my blog to share notes and learning materials on Cloud and Data Analytics. My current focus is on Microsoft Azure and Amazon Web Services (AWS).

I like to write and I try to document what I learn to share with others. I believe that knowledge is useless unless you share it; the more you share, the more you learn.

Recent comments

Photo Stream