Building a Data Pipeline with Kafka, Spark Streaming and Cassandra 1. Overview Apache Kafka is a scalable, high performance, low latency platform that allows reading and writing streams of data like a messaging system. We can start with Kafka in Java fairly easily. Spark Streaming is part of the Apache Spark platform that… Continue Reading kafka-spark-data-pipeline

Exactly Once Processing in Kafka 1. Overview In this tutorial, we’ll look at how Kafka ensures exactly-once delivery between producer and consumer applications through the newly introduced Transactional API. Additionally, we’ll use this API to implement transactional producers and consumers to achieve end-to-end exactly-once delivery in a WordCount example. 2.… Continue Reading kafka-exactly-once