I bricked my database once... damn!

Date Published

Definitely not the best of feelings, for sure. But it will probably happen to everyone at some point in their life. Whether in the first or last few years. Whether in local, staging, or production. Unfortunately for me, that was in production. Fortunately for me, I had backups.

The reason isn't exactly important or worth mentioning to be honest. There are countless reasons why a database would brick: wrong migration, deleted volume, messed up deployment pipeline. What's important is: what do you do after?

Even if you have state of the art disk-level error correction with triple partition duplication, with quadruple node data replication. Even if your database is running with all the possible features to prevent loss of data for whatever: risk zero does not exist.

So, if you value your sanity (and possibly your job): backup your data. And do it properly.

Chances are, if your database can get bricked, whatever your application is running on does too. Which is why, for a proper backup strategy, the industry standard has been the 3-2-1 rule.

Three copies of data, on two types of media, with one stored offsite

Now, that it is the industry standard doesn't mean everyone does it, or that it's always good. In this particular case, I think it actually is. It could be overkill for some cases though. It all depends what you have to lose, to be honest, as well as your risk area.

In my case, I didn't follow the rule perfectly (I only had one copy offsite), but it was enough!

The whole point of the 3‑2‑1 backup rule isn’t just redundancy for redundancy’s sake. Keeping backups only in the same environment as your production system means a single catastrophic event can wipe out everything you think you had safe.

We have many examples of data losses a bit everywhere, to the point where the youtuber Kevin Fang was able to make a series out of it. One of which when this basic backup rule saved them from catastrophy.

In 2024, UniSuper, which manages tens of billions of dollars for hundreds of thousands of members, suffered an unprecedented disaster when their entire cloud account on a major provider was accidentally deleted due to a configuration issue in the provider’s infrastructure.

All the data and backups stored inside that one cloud environment were erased; even those replicated across multiple regions. That meant the usual “high availability” and region‑to‑region redundancy offered by a cloud platform didn’t protect them.

Thankfully, UniSuper had additional backups maintained outside of that cloud provider, with an independent service. Those offsite backups weren’t affected by what happened inside the cloud account, and they were ultimately what allowed them to restore their systems after about two weeks of downtime.

It’s a powerful real‑world illustration of the 3‑2‑1 rule: if your only safety net lives within the same ecosystem as your production data, a single escalation or system failure can take all of it down at once.

That’s why the “one copy offsite” part of 3‑2‑1 is so important:

  • Local backups protect against routine failures and corruption.
  • A different media type protects against hardware‑specific issues (e.g., RAID array failure, corrupted snapshots).
  • But the offsite copy guards against the big, unpredictable local failures that can wipe out everything you thought was safe.

In my case, an S3 bucket was plenty enough! As long as S3 doesn't get bricked too...