Overview

The overview of our TwitterProducer config.

 

 

About acks

  1. acks = 0
  • no response is required
  • if the broker goes offline, we will lose data
  • useful for data where it’s ok to lose: metrics, log collection.
  1. acks = 1
  • leader response is requested.
  • no replication is required.
  • the producer may retry, if ack from leader is not received
  • if the leader goes offline, we will lose data
  1. acks = all
  • leader + replicas response is required
  • added latency and safety
  • no data loss if enough replicas

 

 

  • acks=all must be used in conjunction with min.insync.replicas
  • min.insync.replicas implies that at least x brokers taht are ISR (including leader) must response that they have data
  • e.g.:
    • replication.factor=3, min.insync=2, acks=all
    • means you can only have one broker down.
    • if two brokers down: (see below)

About Producer retries

  • in case of transient failures
  • defaults to 0
  • can set to a high number
  • retry can cause messages sent out of order (e.g. sending batch messages)
  • to strictly ensure there will be no re-ordering when producer retries to send message.

  • come with:
    • retries = Integer.MAX_VALUE
    • max.in.flight.requests = 1(Kafka >= 0.11 & 1.1)
    • max.in.flight.requests = 5 (Kafka >= 1.1)
    • acks = all

About Idempotent Producer (Kafka >= 0.11)

  • come with:
    • retries = Integer.MAX_VALUE
    • max.in.flight.requests = 1(Kafka >= 0.11 & 1.1)
    • max.in.flight.requests = 5 (Kafka >= 1.1)
    • acks = all

About Message Compression

  • better throughput
  • better disk utilisation 
  • faster transfer data
  • smaller producer request size
  • CPU cost

 

About Linger.ms & batch.size

  • low latency and high throughput
  • Linger.ms:
    • number of milliseconds a producer is willing to wait before sending a tach out (default 0).
    • If batch is full, before linger.ms, the batch will be sent immediately.
  • batch.size:
    • maximum number of bytes that will be included in a batch. (default 16 KB)
    • any message is bigger than batch size, will not be batched.
    • batch.size is allocated per partition, so don’t set it too high.

 

How keys are hashed

  • Add or remove a partition, will cause hashkey recalculate.

 

Max.block.ms & buffer.memory

  • if producer produces faster than brokers can take, the buffer will be full, then .send() will not return right away (blocked)
  • max.block.ms=60000, if .send() blocks 60 seconds, an exception will thrown.
    • producer has filled up its buffer
    • the broker is not accepting any data
    • 60 seconds has elapsed