Apache Kafka Tutorials
Welcome to Apache Kafka Tutorials. The objective of these tutorials is to provide in depth understand of Apache Kafka.
In addition to free Apache Kafka Tutorials, we will cover common interview questions, issues and how to’s of Apache Kafka.
Introduction of Apache Kafka
Apache Kafka is a distributed publish-subscribe messaging system. It was originally developed at LinkedIn Corporation and later on became a part of Apache project. Kafka is a fast, scalable, distributed in nature by its design, partitioned and replicated commit log service.
Apache Kafka differs from traditional messaging system in:
-It offers high throughput for both publishing and subscribing.
-It supports multi-subscribers and automatically balances the consumers during failure.
-It persist messages on disk and thus can be used for batched consumption such as ETL, in addition to real time applications.
Kafka is one of those systems that is very simple to describe at a high level, but has an incredible depth of technical detail when you dig deeper. The Kafka documentation does an excellent job of explaining the many design and implementation subtleties in the system, so we will not attempt to explain them all here. Kafka is a distributed publish-subscribe messaging system that is designed to be fast, scalable, and durabl
These core tutorials will help you to learn
the fundamentals of Apache Kafka.
For an in-depth understanding and practical experience,
explore Online Apache Kafka Training.
Following are a few benefits of Kafka −
-Reliability − Kafka is distributed, partitioned, replicated and fault tolerance.
-Scalability − Kafka messaging system scales easily without down time..
-Durability − Kafka uses
Distributed commit log which means messages persists on disk as fast as possible, hence it is durable..
-Performance − Kafka has high throughput for both publishing and subscribing messages. It maintains stable performance even many TB of messages are stored.
Uses of Kafka
-Website activity tracking: The web application sends events such as page views and searches Kafka, where they become available for real-time processing, dashboards and offline analytics in Hadoop
-Operational metrics: Alerting and reporting on operational metrics. One particularly fun example is having Kafka producers and consumers occasionally publish their message counts to a special Kafka topic; a service can be used to compare counts and alert if data loss occurs.
-Log aggregation: Kafka can be used across an organization to collect logs from multiple services and make them available in standard format to multiple consumers, including Hadoop and Apache Solr.
-Stream processing: A framework such as Spark Streaming reads data from a topic, processes it and writes processed data to a new topic where it becomes available for users and applications. Kafka’s strong durability is also very useful in the context of stream processing.