Back to Blog

Using Cassandra database on a production level

14/09/2024

Using Cassandra database on a production level

Cassandra is my go to database for any service that I provide. Its fault tolerance and easy expanding of clusters makes it perfect for dealing with lots of read/writes during production.

People always get confusded what a node is, and what a cluster is. A node is considered as a single server running the Cassandra service. Down below is a command output of

sudo apt cassandra status
.

Cassandra Status

A cluster is a collection of nodes.

Fault Tolerant

During production, if a node fails during to a disk failure and is unable to read and/or write to the disk, Cassandra informs its peers (other nodes) about the disk failure and announces dead, when Cassandra announces dead, no requests can be made to that node to either insert or update data etc. This is where replication nodes come into play, depending on your keyspace configuration a number of nodes can replicate data of another. A good replcation factor is around 3, meaning 3 nodes will hold the same data. If one of our nodes suddenly announces dead, then requests will switch to the other replication nodes. Replication factors are also useful if a natural disaster happens near a datacenter, and theoretically the datacenter is filled with water, requests will simply switch to another node. Ok so replication nodes are really useful, if another node announces dead, but what other benfits are there? There are also region benifits, lets say you mainly operate in San Jose in USA, you could just spread your nodes across the US, for example in us-east and us-west. Thats good, but also not good. If a user from australia wants to use your service, requests to your api server and then to Cassandra will take longer than that user being located in US. Thus if you scatter your nodes through the world, for example in us-west, jp-east, uk-south and have a corresponding api server located in that region, users can achive blazing fast response times. The average internet connection is under 100mbps, so making a request from Sydney, Australia to San Jose, US wouldn't have the fastest responce times as compared to Sydney, Australia to Sydney, Australia.

Expanding

As your user base grows, you need more storage and more nodes to keep pressure off other nodes, for faster responces Etc. Adding a node to a cluster isn't really hard after doing it 100 times. Setuping conf files, JMX, TLS, authentication, data directories is really the bare minimum. Alot of the time its copying and pasting TLS certs, keystores, truststores, and conf files. When you add a node to a cluster, all cassandra nodes become aware of the new node via the gossip protocol, the gossip protocol is pretty much nodes sending information to others node, they could send stored system_auth, replication data, discovery of a new node Etc. Nodes will use this gossip protocol to get to know each node better, over time each node learns about the diagram of the cluster.

A datacenter is a collection of nodes in well a datacenter, nothing speical to it.

Clusters with lots of nodes

Cassandra has been tested on a cluster of 1000 nodes using hundreds of real world use cases sand schemas tested with replay, fuzz, property-based, fault-injection, and performance tests. For example, Apple Inc. has a Cassandra cluster with around 100,000 nodes.

More info

Cassandras own docs provide way more usefull information, it goes into depth on how data is stored, and much more. https://cassandra.apache.org/_/cassandra-basics.html