Chapter 3: Zia in IT

AI has endless applications in technology and is bound to evolve in the next decade. We've only scratched the surface. Today, users from multiple fields like healthcare, manufacturing, retail, and education can tap into its potential to accelerate their IT processes. In this section, we'll talk about how ManageEngine uses Zia. This includes our service desk, device management, advanced analytics, and security applications.

IT service desk

ManageEngine offers enterprise IT solutions for businesses. However, we have our own IT needs and an in-house IT team working around the clock to manage assets, product releases, change requests, and incidents.

A chatbot acts as a filter for redundant requests and helps the IT team reduce the number of tickets. It's the first step in the help desk cycle. When an employee needs help with an issue they're facing, Zia Chatbot can answer their questions. ManageEngine recently launched Zia Blended Conversations to build interactive user conversations. System admins can design workflows with responses to FAQs and automate frequent processes using a low-code flow builder. When a question is raised, Zia provides a solution from the knowledge base and enhances self-service capabilities.

Let's say Jane from the UI/UX team is having trouble with her software and the chatbot is unable to provide the exact resolution she's looking for. Zia can help her raise a ticket. Once the ticket is raised, it's categorized based on the pattern it follows, i.e., the type and template of the ticket. For example, if the ticket contains words like thank you, it can be classified as gratitude while tickets containing "broken" or "not working properly" can be classified as complaints. The template prediction and category prediction notes the difference in the emails or tickets and segregates them to help resolve them better. Each prediction is recorded and marked as a correct or incorrect prediction by the user. This feedback equips Zia with training data, enabling better-informed decisions in the future.

Based on the subject and description provided by Jane, Zia suggests an appropriate category and template, and the ticket is assigned to a technician. This makes it easier for technicians since they no longer have to assign these details manually.

On the technician's end, Zia can be configured to log tickets and display them according to priority level. After resolving Jane's issue, it is marked as completed. However, Jane requires further assistance because the tool is now asking for a security key she does not have. She drops another email to the system admin. Zia analyzes her response for keywords and reopens her request. Why do we need Zia to do that? Zia's intervention helps technicians prioritize critical requests and avoid flooding the service desk with thank you notes (even though they're appreciated), acknowledgment notifications, and automated emails.

If the problem lies with the device and not the software, Jane will need a new device to work on. In this case, the technician needs to get approval from her manager and the system admin monitoring device availability. In such cases, the approval prediction engine automates the requests, eliminating the manual work that would be involved otherwise.

Finally, the service desk manager can review analytics on the live dashboard to review the self-service adoption rate and add more content to the knowledge base. Zia Chatbot reviews this content and helps us reduce the number of incoming tickets, allowing technicians to focus on critical tasks with SLAs.

Analyzing help desk data

Being an enterprise, we have system admins working from multiple hub offices tending to requests raised from their hub office and nearby spoke offices. The head of IT oversees IT services from our headquarters in Chennai, India. The system admins can use Ask Zia, the NLP-powered chatbot in our analytics solution, to display key metrics.

Let's say the IT team is conducting a quarterly review. The system admin can pull up a report of the overall volume and type of incoming tickets. The report displays tickets according to the regions (to evaluate the regions with maximum number of tickets), the time period (e.g., Texas in Q2 2023), and the categories (e.g., new device, device repair, non-IT asset request, and so on).

The data from these reports holds significant value. It is a crucial part of planning ahead in terms of operating efficiency. It helps us understand user behavior and answer questions like:

What is our ticket-to-technician ratio?
Was there a specific incident that occurred frequently?
How much are these incidents costing us?
What is the average age of devices in the workplace?
What is our spending per employee?

The data from this report is closely tied to our finances. With predictive analysis, our system admins can forecast the request trend for Q3 and Q4 2023 and order assets accordingly. These details are sent to the finance team so they can allocate the required funds for assets and plan the budget for each quarter. Most importantly, this data gives a bird's-eye view of our ITOps, database, and IT analytics, which helps with decision-making.

Endpoint management

The hybrid work model has become the norm worldwide. At Zoho, we also offer the option of working from a spoke office, allowing employees to move away from the hustle and bustle of the city to embrace a quiet, rural lifestyle. The hybrid model means our system admins have no direct access to devices. Zia acts as a bridge, connecting entities from two different locations and ensuring a protected environment in which to work.

Consider a scenario where a system admin, John, in our Indian headquarters receives an alert about a device in our spoke office in New Braunfels, TX. John can Ask Zia to pull up reports on managed assets to identify the device owner and the issue that triggered the alert. The device belongs to a developer, Rob, who is currently working remotely. Using Zia's voice command feature in the Endpoint Central mobile app, John establishes a remote connection with Rob's laptop.

With a simple voice command, John asks Zia to:

Scan, detect, and deploy missing patches.
Uninstall unnecessary software.
Update license details for allowed software.

John gets the job done from the other side of the world. Adding AI to endpoint management establishes better accountability and security.

Monitoring

A system admin's job is demanding, to say the least. Our system admins work to ensure things are working smoothly. Sometimes, they clock back in after their regular work hours to deal with emergencies like network outages and security breaches.

Using ManageEngine's monitoring solution, system admins can keep track of network, website, or server activity. They can create an automation profile and orchestrate automatic actions for potential incidents and alerts. The AI-powered anomaly reporting feature uses robust principal component analysis (RPCA) and matrix sketching algorithms to detect any unusual spikes or aberrations in critical performance attributes like response time or CPU usage.

How does anomaly detection work? It is commonly categorized as:

Univariate anomaly detection: Detection of anomalies caused by single parameters.
Multivariate anomaly detection: Detection of aberrations caused by multiple separate parameters or a combination of several parameters that are dependent on each other.

Our Enterprise AIOps e-book discusses these models in detail.

Let's assume we've installed a new server. We monitor servers by installing a lightweight agent in the server. The agent interacts with the server and collects over 60 performance metrics needed for monitoring. The agent then sends the collected metrics to secured servers. The whole process is repeated periodically. Our system admins receive an alert when a threshold is breached. These thresholds can be determined manually or the system admin can enable Zia-based thresholds.

The anomaly dashboard notifies the admin regarding server memory space. As a temporary fix, the admin reroutes traffic to another server, raises a request for another server, and resolves the issue. Zia can then use historical data to predict space occupancy over the next few months to maximize space.

After an incident, admins are expected to document details such as the root cause, the deployed fix, and the measures taken to avoid similar incidents. This can be done using the Postmortem feature. Site24x7's NLP-based chatbot can be used to pull up details, and Zia's Postmortem editor helps them document the incident quickly.

Outage prediction

Given a few recorded episodes of previous outages, the outage prediction engine confirms whether a current abnormal state would lead to a potential outage and when. It also explains why it arrived at the prediction. Zoho has filed a provisional patent for outage prediction using AI in the full-stack performance monitoring space.

Consider a scenario where there is a record of an outage that started with the application performance monitoring (APM) instances going down followed by the CPU and then the server. If the pattern repeats, it's almost certain that it would lead to an outage again. The outage prediction model helps us predict when an outage would occur and to what extent, and the infrastructure entities that might get affected.

Security

Cyberattacks are evolving as cybercriminals are getting more creative by the day. They can hack into systems within minutes. Traditional AI models require time to train on relevant data and produce accurate results, and we don't have the luxury of time when its comes to security threats. That's why traditional AI approaches don't work in the security domain. We need a quicker and a more robust approach that can identify hacks, malware, etc. and alert the user without prior incident. In such cases, machine learning has proven to be effective. Unsupervised ML technically observes and learns what's normal according to each user. It classifies anything out of the norm as an anomaly and alerts the user so they can make appropriate decisions.

User and entity behavior analysis (UEBA)

The UEBA system is a key component of any cybersecurity framework that seeks to detect insider threats. It builds behavioral profiles of users and entities in the organization and assigns a risk score. The system raises an alert when the behavior deviates from the previously established normal baseline. UEBA helps us identify compromised accounts, data exfiltration, malware, and logon anomalies. It serves both as a diagnostic tool and an early warning system.

ML-powered UEBA detects anomalies and categorizes them by:

Time

A user accessing their device at 3am, when their usual access time is between 8am and 7pm, could be an indication that their account has been compromised.

Count

An abnormal number of events happening in a specific period falls under the count category. For example, a large number of printing requests or an unusual increase in the number of file downloads.

Pattern

Events that are abnormal because their attributes don't follow the usual pattern, like a user logging in from a different location or accessing a new machine, fall under the pattern category.

Details of anomalous behavior based on users and entities in the network are displayed on the UEBA dashboard. It gives users greater visibility into threats with its score-based risk assessment and provides an overview of the organization's security posture. This approach helps us determine which threats actually merit investigation.

Imagine a scenario where a user accesses a sensitive file not required for their role on a Sunday. It suggests both pattern and time anomalies. Now, it could be harmless. The employee may have been curious and probably wanted to know what was marked Confidential. However, this is a business. We cannot afford to give users the benefit of the doubt when it comes to security. This activity is marked as suspicious by the UEBA system and it increases the user's risk score.

There may be cases where an event may be anomalous considering the past behavior of a specific user, but may not be anomalous considering the normal behavior of similar users. How can we be sure that this is a security concern? In those cases, we use peer grouping. Peer group analysis categorizes users who share similar characteristics as one group. The risk score generated through UEBA has to be validated by comparison with the baseline of the peer group to which that user belongs. This can lead to a reduction in false positives.

The UEBA system is highly customizable, giving us complete control over the types of anomalies it detects. It adapts to changing data patterns automatically without any intervention and can be deployed in any domain, as long as the types of anomalies to be detected are configured correctly. The ZLabs-ML team has filed a provisional patent on the design of the UEBA engine currently deployed in our log management solution.

Bot detection

Web bots constitute a huge chunk of traffic and have been a persisting problem for companies. Bots typically try to scrape content from a website, leading to stolen content. Bot management solutions give control and insights to the admin as to who visits the pages.

Our website optimization tool uses a bot detection suite. It collects fingerprint attributes of the user to check their consistency. If these attributes are inconsistent, we can confirm that it is a bot. Otherwise, behavioral metrics like pointer movement, scrolling speed, and use of backspace are collected, and ML techniques are applied to confirm if it is a bot. The suite helps prevent security incidents like DDoS attacks caused by bots, web forms being filled in with spam, wastage of bandwidth, intensive scraping (like price scraping bots), etc. Further, this facilitates data collection for non-bot traffic.

CAPTCHA validation

The CAPTCHA validation system is also used to identify bots. It aims to differentiate between real users and automation tools by presenting challenges that are difficult for computers or bots but easy for humans to crack. However, with validation techniques getting better, bots are more frequently passing through these challenges. This is where AI comes in.

The key here is the difference between humans and automation tools in perceiving image, shape, and texture biases. Human eyes carry an inherent shape bias while the object recognition tools carry a texture bias. For example, humans may identify a butterfly based on its wings while the tools recognize it based on its texture. Our CAPTCHA engine employs neural style transfer to utilize this bias difference.

There are two variants:

Option-based:

Let's take Van Gogh's "The Starry Night," painting as an example, but it's actually a picture of a dolphin. The dolphin is a content image and the painting is a style image. We create the image of a dolphin in the style of Van Gogh's painting, which confuses the automation tool. The user is presented with the neural style transferred image and asked to choose the option that best describes it.

Slider-based:

This works like a jigsaw puzzle. A neural style transferred image with two empty spaces and one missing piece is presented to the user with a range slider. The user has to position the piece over the correct space using the slider.

Malware

Malware (malicious software) is a broad term for any software or code that is designed to infect a system or network. It is designed to steal data, gain unauthorized access to sensitive information or a network, or affect business operations.

There are two types of malware analysis:

Static analysis:

The model is trained to examine and differentiate what a normal file is not supposed to do and what a malware file likely does.

Dynamic analysis:

The files are executed inside a protected testing environment where all URL calls and domains can be inspected, through which the malware files are identified.

Final thoughts

Needless to say that with AI, our workforce is performing certain routine tasks easier, faster, and smarter than we ever could before. AI has impacted pretty much every department at Zoho Corp and that's why we have ZLabs, a team that's dedicated to fine tuning our existing AI functions and innovating newer ones. The team is currently working on exciting features like video analytics for internal operations, cross lingual services, and speech recognition for our business and IT suite of applications. Our goal is to help users more easily accomplish more tasks, and AI plays a pivotal role.

About the author

Mahanya is a content writer who specializes in IT stories, documenting the journey of enterprises like ManageEngine - their ups and downs, internal processes, and core principles. She is keenly interested in interacting with IT thought leaders to get their perspective on digital transformation. A true zillennial at heart, she spends her spare time on social media finding homes for rescue dogs.