MirrorMaker 2.0 is a robust data replication utility for Apache Kafka. It acts as a consumer and producer for multiple Kafka clusters, so that users can easily and reliably copy data from one cluster to another. This increases the resilience of Kafka-centric architectures.
Reasons to replicate your Kafka cluster data
Kafka only stores data temporarily and is not really a database in that sense. Why, then, should you worry about replicating that fleeting data? Because data replication between your Kafka clusters can add flexibility, performance and reliability to your core data infrastructure. Particularly large companies with huge data volumes can benefit from this.
1. Disaster recovery
The best understood and most important scenario where you would want to replicate data between Kafka clusters is disaster recovery. Many businesses rely on Kafka as a cornerstone of their data infrastructure. Kafka is mature, reliable and offered by trusted providers, but disasters can still happen, and data can still become temporarily unavailable--or lost altogether. The best way to mitigate the risks is to have a copy of your data in another Kafka cluster in a different data center. That way, you can switch clients to it relatively seamlessly, moving to an alternative deployment on the fly with minor or no service interruptions. MirrorMaker2 preserves consumer offset mappings and offers tooling for nearly transparent consumer migration between clusters. This is a key to successful disaster recovery.
2. Going to the cloud
More and more companies are migrating their Kafka clusters from on-prem installations to the cloud or between cloud regions or providers. Tools that support the cloud migration of data services give you more control over your data. Replicating data between Kafka clusters is an excellent choice for low-downtime Kafka cloud migration.
3. Getting closer
For many global businesses, it's not uncommon to produce and consume data in geographically distributed locations. Replication lets you bring the data where the users are. This cuts down on latency and network costs and offers optimal throughput.
4. Isolating data
Some data sets may need to be isolated to a separate Kafka cluster for legal, compliance, and performance reasons. For instance, in the case of legal considerations, you can limit the retention period of a topic you’re writing to in one cluster and mirror it to another with longer retention in a region that’s compliant to read from. To boost performance, you can use one cluster to fleetingly store incoming data, then aggregate it and mirror only the aggregated data to another cluster. This keeps your incoming pipeline clean but still retains the important bits, and as a bonus it may save money in terms of storage space, too.
5. Data analytics
Aggregation is also a factor in data pipelines, which might require the consolidation of data from distributed Kafka clusters into a single one. That aggregate cluster then broadcasts that data to other clusters and/or data systems for analysis and visualisations.
Kafka MirrorMaker makes life easier
When replicating Kafka clusters using Kafka Connect, MirrorMaker2 synchronizes topic configuration (including partitioning) and ACLs from source to target clusters. No more need for external tooling to make this happen.
In situations where records are partitioned semantically, it’s good to know that the partitions are preserved during replication; rebuilding them would be a pain.
Complex replication topologies, like active-active and chain replication, are easy to set up. A single MM2 cluster can run multiple replication flows, and it has a mechanism for preventing replication cycles.
Apache Kafka MirrorMaker2 makes for a robust replication architecture that you can use for multiple purposes. And the best thing is, you don’t have to set anything up by yourself--just get it as an add-on to Aiven for Apache Kafka, and let Aiven do the work.
Not using Aiven services yet? Sign up now for your free trial at https://console.aiven.io/signup!