Skip to content

Publish-subscribe messaging rethought as a distributed commit log

License

Notifications You must be signed in to change notification settings

obulpathi/kafka

Repository files navigation

kafka

Publish-subscribe messaging rethought as a distributed commit log

  • Kafka has very good performance, but there are some rough edges around guarantees
  • Consumption and production is done in small sizes batches, this is done in order to achieve higher performance
  • Check pointing on production and consumption side is done on a coarse grained level, which creates lots of problems for one time guarantees
  • kafka brokers are stateless, and all the state is stored in ZooKeeper
  • If a node (Producer, Consumer or Kafka Broker) crashes and restarted, it resumes from the last checkpointed state in ZooKeeper
  • This leads to duplicate or dropped messages if not carefully worked around in code
  • Also this solution is kind of redundant to transfer files from FTP Server to Kafka and then stream them to HDFS and process by Spark (Rather than transferring files from FTP Server to HDFS)
  • Kafka is really good as a realtime storage for Analytics solutions
  • As a storage for Batch Data, HDFS is much better than Kafka

About

Publish-subscribe messaging rethought as a distributed commit log

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages