Lessons Learned Running Servers
Ten years of running my own infrastructure, distilled into a dozen lessons. Some I learned the hard way. Some I am still learning.
I have been running my own servers, in some form, for roughly a decade. This is a list of lessons, written down so I can come back to them. Some are obvious in hindsight. Some I had to learn the hard way. Some I am still learning.
1. Backups are not optional. They are the difference between a learning exercise and a disaster.#
I lost a database in 2017. It was a personal project, no production users, but I had not backed it up. The data was not important. The lesson was: I never want to feel that feeling again.
The right backup system has three properties:
- Automatic. You do not want to be the one who remembers to run the backup.
- Offsite. A backup that is on the same machine as the data is not a backup. A backup that is on the same network as the data is barely a backup.
- Tested. A backup that has never been restored is a hope, not a backup. Test the restore at least quarterly.
I use restic for most things, with the destination being a remote object store. The setup took an hour. The peace of mind is permanent.
2. The default firewall is your friend. Leave it on.#
Fedora, Ubuntu, and most Linux distributions ship with a firewall that blocks everything by default. The default is correct. The mistake is to turn it off because something does not work.
The right approach is to allow only the ports you need, only from the IPs you trust, and only for the duration you need them. If you are running a web server, you need 80 and 443 open to the world. If you are running SSH, you need port 22 open to your IP, not to the world. If you are running a database, you do not need it open to the world at all.
3. SSH key authentication. Always. Passwords, never.#
A password is something you know. A key is something you have. The security difference is enormous. A brute-force attack against a password is plausible. A brute-force attack against a 4096-bit RSA key is not.
The setup is:
- Generate a key pair on your laptop (
ssh-keygen -t ed25519). - Copy the public key to the server (
ssh-copy-id). - Disable password authentication in
/etc/ssh/sshd_config(PasswordAuthentication no). - Restart sshd.
Once this is done, brute-force attacks on your server will fail. The log noise will drop by 99%. The security will go up by a similar amount.
4. Updates are not a chore. They are a feature.#
Every CVE that is announced is a known vulnerability. Every unpatched server is a known exposure. The right approach is to update weekly, with a script, and to read the changelog for security-sensitive packages.
Fedora’s dnf-automatic and Ubuntu’s unattended-upgrades are good defaults. They download and install security updates automatically. You still need to reboot for kernel updates, but the rest is handled.
The thing you do not want to do is “set it up and forget it for a year.” That works until the day it does not. The right cadence is weekly. The right tooling is automation. The right mindset is “updates are part of the job.”
5. Logs are for searching, not for reading.#
The default behaviour, on most servers, is to write logs to /var/log, rotate them weekly, and never look at them. This is wrong. The logs are the most useful debugging tool you have.
The right approach is:
- Centralise the logs. Ship them to a single place. Loki, Elasticsearch, or even a separate machine.
- Index them. Make them searchable.
grepis the wrong tool for production logs. - Alert on them. If something unusual is in the logs, you should know about it before the user tells you.
The setup cost is a few hours. The debugging time saved is hundreds of hours over the lifetime of the server.
6. The 3-2-1 backup rule works. The 3-2-1 backup rule is also a minimum.#
The 3-2-1 rule is: 3 copies of the data, on 2 different media, with 1 offsite.
This is the floor, not the ceiling. The right approach for important data is:
- 3-2-1, plus
- Encryption at rest
- Encryption in transit
- Versioned backups (so you can recover from a few weeks ago, not just last night)
- Tested restores (quarterly)
- A documented recovery procedure (so the next person can do it)
The cost is low. The cost of not having it is high.
7. The right monitoring is the kind that wakes you up when something is wrong, and stays quiet the rest of the time.#
The wrong monitoring is the kind that wakes you up for every blip. If your phone buzzes 10 times a night, you will start ignoring it. The right alerts are:
- Actionable. You should know what to do when the alert fires.
- Urgent. The alert should be for things that need attention soon, not things that can wait until morning.
- Documented. The alert should link to a runbook.
Uptime Kuma, Nagios, and Prometheus all work. Pick one. Set up the basics. Refine over time. The goal is a system that you trust to tell you when something is wrong.
8. The simplest solution that works is the right solution.#
I have over-engineered many things. I have built a Kubernetes cluster to run a static website. I have written a custom service discovery system for two services. I have used an event-driven architecture for a CRUD app.
In hindsight, the simpler solution would have worked just as well. The more complex solution was harder to maintain, harder to debug, and easier to break.
The right question is not “what is the most elegant architecture?” It is “what is the most boring solution that will work for the next five years?” Boring is good. Boring is maintainable. Boring is what you can hand off to the next person.
9. Document what you did. You will forget.#
I have started a server, configured it perfectly, and then forgotten what I did. Six months later, something breaks, and I am reading log files and git log to figure out what I did.
The right approach is to document, in a single place, what is running, why it is running, and how to rebuild it. The format does not matter. Markdown, a wiki, a README.md in the repo — whatever you will actually maintain. The discipline is what matters.
The next person who has to debug the server (which might be you) will thank you.
10. The cost of a server is not the hardware. It is the time.#
The hardware is cheap. A used mini-PC is ₹15,000. A small VPS is ₹500/month. The interesting cost is the time you spend configuring, debugging, updating, and monitoring.
The right approach is to be honest about how much time you have. If you have ten hours a week, you can run a homelab with a few services. If you have two hours a week, you should pay for managed services and focus on the application. There is no shame in either choice.
The wrong approach is to set up a complex infrastructure, get bored with it, and let it rot. A half-maintained server is worse than no server at all.
11. The community is your best resource.#
The Linux and self-hosting communities are remarkably helpful. The right places to ask questions are:
- The project’s own forum or Discord. Maintainers and experienced users hang out there.
- Server Fault / Stack Exchange. For specific, well-formed questions.
- Reddit (r/selfhosted, r/homelab, r/linux). For general advice and inspiration.
- The documentation. Read the docs first. Most of the time, the answer is there.
The wrong place to ask questions is Twitter. The right place is somewhere that has a search function.
12. The most important thing is to enjoy it.#
Running a server should be a hobby, not a chore. If it stops being fun, the right answer is to step back, simplify, or pay someone else to do it. The goal is to learn, to build, and to understand. Not to maintain a fragile system that you resent.
I still enjoy it, ten years in. I expect to enjoy it for another ten. The most important reason is that I have learned to keep the system small, the backups tested, the documentation current, and the time investment bounded.
These are the lessons. They are not novel. They are the kind of thing that is obvious in retrospect, and that I had to learn by doing. If you are starting out, save yourself the time: read the docs, automate the boring stuff, back up the data, and enjoy the journey.