9. Juggling Logs and Other Circus Tricks
Production architectures are generally busy places. Things are happening all the time: Customers are being served, reports are being run, and things are breaking and getting fixed. How do we know this? Logs.
The logs that are written throughout the architecture are of vital importance for monitoring, auditing, and troubleshooting. However, despite their being crucial, the infrastructure to journal and analyze logs is often one of the most neglected components in a large architecture.
There are two types of logs: those needed to provide reporting and those needed to troubleshoot problems. Because reports serve a purpose and are expected and reviewed, the logs that feed them must be correctly processed. Logs used to troubleshoot problems are often needed only when disaster strikes and nearly as often, those logs are inefficient to adequately address the troubleshooting needs.
This leads to the obvious question: "What is so difficult about logging?" Writing the logs from 20, 50, or 100 web servers into a centralized location is not rocket science. There are bad ways, good ways, and great ways to go about logging. Amazingly, a vast majority of the enterprise architectures that I have reviewed go about it completely wrong.
In this chapter, we will cover several methods of log aggregation and discuss how logging infrastructures have evolved to address the needs of today's large architectures.