kafka_ecosystem
From:https://dzone.com/articles/kafka-detailed-design-and-ecosystem
Apache Kafka 的核心要素有
中介者,订阅主题,日志,分区还有集群,还包括像 MirrorMaker 这样的有关工具
。Kafka 生态系统由Kafka Core
,Kafka Streams
,Kafka Connect
,Kafka REST Proxy
和Schema Registry
组成。Kafka 生态系统的其他组件多数都来自 Confluent,它们并不属于 Apache。
- Kafka Stream 是一套用于
转换,聚集并处理来自数据流的记录并生成衍生的数据流的一套 API
;- Kafka Connect 是一套用于
创建可复用的生产者和消费者
(例如,来自 DynamoDB 的更改数据流)的连接器的 API;- Kafka REST Proxy 则用于
通过 REST(HTTP)生产者和消费者
;- Schema Registry 则用于
管理那些使用 Avro 来记录 Kafka 数据的模式
;- 而 Kafka MirrorMaker 用于
将集群的数据复制到另一个集群里去
。
- Kafka Connect Sources 是 Kafka 记录的来源,而 Kafka Connect Sinks 则是这一记录的目的地。
Kafka Connect
Kafka has a built-in framework called Kafka Connect for writing sources and sinks that either continuously ingest data into Kafka or continuously ingest data in Kafka into external systems.
The connectors themselves for different applications or data systems are federated and maintained separately from the main code base. You can find a list of available connectors at the Kafka Connect Hub.
Distributions & Packaging
- Confluent Platform - http://confluent.io/product/. Downloads - http://confluent.io/downloads/.
- Cloudera Kafka source (0.11.0) https://github.com/cloudera/kafka/tree/cdh5-1.0.1_3.1.0 and release http://archive.cloudera.com/kafka/parcels/3.1.0/
- Hortonworks Kafka source and release http://hortonworks.com/hadoop/kafka/
- Stratio Kafka source for ubuntu http://repository.stratio.com/sds/1.1/ubuntu/13.10/binary/ and for RHEL http://repository.stratio.com/sds/1.1/RHEL/
- IBM Event Streams - https://www.ibm.com/cloud/event-streams - Apache Kafka on premise and the public cloud
- Strimzi - http://strimzi.io/ - Apache Kafka Operator for Kubernetes and Openshift. Downloads and Helm Chart - https://github.com/strimzi/strimzi-kafka-operator/releases/latest
- TIBCO Messaging - Apache Kafka Distribution - https://www.tibco.com/products/apache-kafka Downloads - https://www.tibco.com/products/tibco-messaging/downloads
Stream Processing
-
Kafka Streams
- the built-in stream processing library of the Apache Kafka project
-
Kafka Streams Ecosystem:
- Complex Event Processing (CEP): https://github.com/fhussonnois/kafkastreams-cep.
-
Storm - A stream-processing framework.
-
Samza - A YARN-based stream processing framework.
-
Storm Spout - Consume messages from Kafka and emit as Storm tuples
-
Kafka-Storm - Kafka 0.8, Storm 0.9, Avro integration
-
SparkStreaming - Kafka receiver supports Kafka 0.8 and above
-
Flink - Apache Flink has an integration with Kafka
-
IBM Streams - A stream processing framework with Kafka source and sink to consume and produce Kafka messages
-
Spring Cloud Stream - a framework for building event-driven microservices, Spring Cloud Data Flow - a cloud-native orchestration service for Spring Cloud Stream applications
-
Apache Apex - Stream processing framework with connectors for Kafka as source and sink.
Hadoop Integration
- Confluent HDFS Connector - A sink connector for the Kafka Connect framework for writing data from Kafka to Hadoop HDFS
- Camus - LinkedIn’s Kafka=>HDFS pipeline. This one is used for all data at LinkedIn, and works great.
- Kafka Hadoop Loader A different take on Hadoop loading functionality from what is included in the main distribution.
- Flume - Contains Kafka source (consumer) and sink (producer)
- KaBoom - A high-performance HDFS data loader
Database Integration
- Confluent JDBC Connector - A source connector for the Kafka Connect framework for writing data from RDBMS (e.g. MySQL) to Kafka
- [Oracle Golden Gate Connector](https://java.net/projects/oracledi/downloads/directory/GoldenGate/Oracle GoldenGate Adapter for Kafka Connect) - Source connector that collects CDC operations via Golden Gate and writes them to Kafka
Search and Query
- ElasticSearch - This project, Kafka Standalone Consumer will read the messages from Kafka, processes and index them in ElasticSearch. There are also several Kafka Connect connectors for ElasticSeach.
- Presto - The Presto Kafka connector allows you to query Kafka in SQL using Presto.
- Hive - Hive SerDe that allows querying Kafka (Avro only for now) using Hive SQL
Management Consoles
- Kafka Manager - A tool for managing Apache Kafka.
- kafkat - Simplified command-line administration for Kafka brokers.
- Kafka Web Console - Displays information about your Kafka cluster including which nodes are up and what topics they host data for.
- Kafka Offset Monitor - Displays the state of all consumers and how far behind the head of the stream they are.
- Capillary – Displays the state and deltas of Kafka-based Apache Storm topologies. Supports Kafka >= 0.8. It also provides an API for fetching this information for monitoring purposes.
- Doctor Kafka - Service for cluster auto healing and workload balancing.
- Cruise Control - Fully automate the dynamic workload rebalance and self-healing of a Kafka cluster.
- Burrow - Monitoring companion that provides consumer lag checking as a service without the need for specifying thresholds.
- Chaperone - An audit system that monitors the completeness and latency of data stream.
AWS Integration
- Automated AWS deployment
- Kafka -> S3 Mirroring tool from Pinterest.
- Alternative Kafka->S3 Mirroring tool
Logging
- syslog (1M)
- syslog producer : A producer that supports both raw data and protobuf with meta data for deep analytics usage.
- syslog-ng (https://syslog-ng.org/) is one of the most widely used open source log collection tools, capable of filtering, classifying, parsing log data and forwarding it to a wide variety of destinations. Kafka is a first-class destination in the syslog-ng tool; details on the integration can be found at https://czanik.blogs.balabit.com/2015/11/kafka-and-syslog-ng/ .
- klogd - A python syslog publisher
- klogd2 - A java syslog publisher
- Tail2Kafka - A simple log tailing utility
- Fluentd plugin - Integration with Fluentd
- Remote log viewer
- LogStash integration - Integration with LogStash and Fluentd
- Syslog Collector written in Go
- Klogger - A simple proxy service for Kafka.
- fuse-kafka: A file system logging agent based on Kafka
- omkafka: Another syslog integration, this one in C and uses librdkafka library
- logkafka - Collect logs and send lines to Apache Kafka
Flume - Kafka plugins
- Flume Kafka Plugin - Integration with Flume
- Kafka as a sink and source in Flume - Integration with Flume
Metrics
- Mozilla Metrics Service - A Kafka and Protocol Buffers based metrics and logging system
- Ganglia Integration
- SPM for Kafka
- Coda Hale Metric Reporter to Kafka
- kafka-dropwizard-reporter - Register built-in Kafka client and stream metrics to Dropwizard Metrics
Packing and Deployment
- RPM packaging
- Debian packaginghttps://github.com/tomdz/kafka-deb-packaging
- Puppet Integration
- Dropwizard packaging
Kafka Camel Integration
Misc.
- Kafka Websocket - A proxy that interoperates with websockets for delivering Kafka data to browsers.
- KafkaCat - A native, command line producer and consumer.
- Kafka Mirror - An alternative to the built-in mirroring tool
- Ruby Demo App
- Apache Camel Integration
- Infobright integration
- Riemann Consumer of Metrics
- stormkafkamom – curses-based tool which displays state of Apache Storm based Kafka consumers (Kafka 0.7 only).
- uReplicator - Provides the ability to replicate across Kafka clusters in other data centers
- Mirus - A tool for distributed, high-volume replication between Apache Kafka clusters based on Kafka Connect