Chapter 4:
How to kick-start the journey towards improved logging

Establish a dedicated team

ManageEngine's internal log management team is organized into several sub-teams, each with a specific focus. One team manages the search functions and querying, which includes oversight of the elastic search and user search functions. Another team is responsible for managing the distributed file systems, such as Hadoop and Kafka. Finally, there is a team dedicated to log agent and analytics management.

These teams are responsible for managing requests from various product teams within ManageEngine. The teams that manage search and distributed file systems interact with directly responsible individuals (DRIs) from each product or application team. If a team has any queries or issues, the DRI consolidates these and submits them to the logs team for resolution. The logs team member who is responsible for the matter responds to the request.

The DRIs are highly knowledgeable in the area of logs and are well-versed in the requirements of their respective product teams. ManageEngine uses an in-house service management application to manage DRIs, their queries, and other related tasks. In addition, there is a separate group within the collaboration platform that is dedicated to logs coordinators. This facilitates efficient communication and interaction between the DRIs and the logs team, enabling a smooth and effective process for managing and resolving log-related issues.

Apply best practices for log management

At ManageEngine, we have a vast experience in logging, which enables us to create best practices for logging. We make sure that our developers are aware of these best practices and constantly check their logging actions to provide them with feedback on how they could have improved their logging.

Here are the best practices we advise our developers to follow:

1. Log only on error or unexpected behavior:

It is best to avoid logging for every event or action. Instead, log only on error or unexpected behavior and log success in lower log levels like FINE.

2. Avoid redundant logging:

Don't log information that is already available in another log type or log like access log, task engine log, etc. This will reduce the redundancy and the size of the logs.

3. Avoid logging multiple times:

Avoid logging multiple times during an action or operation. Instead, combine multiple logs and log them at the start or end to minimize the size of the logs.

4. Avoid logging select queries and resource access:

Unless necessary, avoid logging select queries, data object, or resource access like Redis, DFS, etc. If needed, log them with lower levels like FINE.

5. Do not log large user content:

Avoid logging large user-given content like uploaded files as it can significantly increase the log size.

6. Avoid logging inside loops:

Avoid logging inside loops that generate huge volumes of prints. It can create a large number of unnecessary log entries and increase the log size.

7. Suppress exception logging for known exceptions:

Instead of logging known exceptions, log only the exception name, reducing the log size and the amount of information.

8. Log with proper level:

Always log with the appropriate level, and avoid logging everything in SEVERE level. It ensures that the logs are accurate and easily searchable.

9. Consider the necessity of the logger:

Before putting a logger, consider the necessity of it and the volume it will generate in production. Even logs that are not printed, like FINE, can impact app server performance.

10. Be mindful of log retention:

Log retention can significantly impact the cost of maintaining the logs. Each TB of logs might require around $1,000 to maintain when in our active system (searchable). Be mindful of the cost associated with your the log retention period.

By following these best practices, our developers not only improve their logging, they also become more aware of their software development process. We refine these guidelines constantly as we optimize our log management practices and pass on that learning to our developers as well. These best practices enable us to save critical resources for the company and optimize our logging process.

To ensure efficient communication with our developers and effective centralized log management, our logs team follows specific guidelines.

Here are some of the best practices our log management team adheres to:

Prepare for unpredictable data volumes:

We can never predict the amount of data that will be logged every day. To avoid being caught off guard, we must always be prepared with at least double the amount of resources we think we need. Even if the CPU usage is just 10% and storage is only 20%, we should not be complacent. It's better to have resources underutilized than to face a log management crisis.

Account for multi-threading in logging

All logging in applications involves multi-threading, and we cannot assume that only one thread will be accessed. We must anticipate and address concurrency issues when designing the logs service to prevent application crashes. By being mindful of these challenges, we can ensure smooth log management processes that effectively support our developers.

Conclusion

Logging is an essential part of modern IT organizations that are centered around software development, like ours. Our journey from disparate logging to creating a well-oiled machine for log management has been a continuous learning process. We've been able to develop robust best practices to help us handle the volume of logs generated by our software applications.

As we look to the future, we see many exciting opportunities to improve our log management framework. For example, we're looking at ways to incorporate logs into our overall monitoring strategy. By connecting the dots between logs, metrics, and traces, we'll gain deeper insights into our IT environment and be able to troubleshoot issues more quickly and effectively.

Ultimately, our commitment to evolving our log management framework is unyielding, and we look forward to the journey ahead.

About ManageEngine

As the IT management division of Zoho Corporation, ManageEngine prioritizes flexible solutions that work for all businesses, regardless of size or budget. ManageEngine crafts comprehensive IT management software with a focus on making your job easier. Our 120+ award-winning products and free tools cover everything your IT needs. From network and device management to security and service desk software, we’re bringing IT together for an integrated, overarching approach to optimize your IT.

About the author

Shivaram’s expertise in IT processes comes from solid experience with IT operations, audits and compliance. Working closely with Zoho in multiple product and operation teams over the years, he understands how to handle the nitty-gritty details of IT processes, and how to nurture them for the better functioning of an organization. As part of ManageEngine Academy, a division of ManageEngine, he now helps customers solve their IT and processrelated problems.

Chapter 4:How to kick-start the journey towards improved logging