Why Visibility matters in Digital Experience Monitoring
Today, nearly every task we do at work—whether collaborating, creating, or communicating—happens in a digital workspace. Devices have become an extension of the modern employee, and their performance directly shapes how work gets done. Just as we prioritize regular health checkups for ourselves, it’s critical to routinely assess the health of our endpoints.
But with a growing fleet of devices spread across geographies and hybrid work environments, manual checkups are no longer practical—or scalable. IT teams need a better way to understand how devices perform and how they impact user experience.
That’s where digital experience monitoring begins—with visibility. By collecting and analyzing real-time endpoint telemetry, IT gains deep insight into key performance indicators like CPU usage, memory consumption, boot time, crash frequency, and more. This visibility lays the foundation for identifying experience issues early, reducing support tickets, and ultimately improving employee productivity.
Visibility starts with Telemetry (or) data
You can’t improve what you can’t measure. Telemetry is the foundational layer of digital experience monitoring—it transforms invisible device behavior into actionable insight. By continuously collecting real-time data points like CPU usage, memory load, boot times, system crashes, and more, IT teams gain deep operational awareness into every endpoint, regardless of where it’s located.
But telemetry is more than just monitoring. It enables proactive IT by surfacing early warning signals before they snowball into user complaints or productivity issues. With the right data in place, teams can detect friction, automate responses, and ensure smooth performance—all before the employee even notices. In short, telemetry turns visibility into control, making it the backbone of modern endpoint monitoring and experience management.
How DEX Manager Plus collects and uses Endpoint Telemetry
DEX Manager Plus has a lightweight agent, that sits on end-user devices and operates silently in the background without impacting performance. This agent continuously collects high-fidelity telemetry data from every managed endpoint—on-site, remote, or hybrid—giving IT teams a real-time pulse on the employee experience. Our agent collects data around the clock, even when the device is offline. Critical/Alert-related data is then posted to the server for further analysis.
The lightweight agent continuously captures a rich stream of telemetry data that directly influences user productivity, device health, and digital experience quality. This telemetry can be broadly classified into two categories:
- Built-in endpoint metrics monitored out-of-the-box
- Custom telemetry collected using user-defined data collectors
Let’s explore each in detail:
Built-in Endpoint Metrics
DEX Manager Plus tracks a curated set of high-impact metrics that provide deep visibility into how well endpoints perform and how they affect end-user experience. These metrics are grouped into four key categories:
- Application ReliabilityIdentifies app-related issues like crashes
- Device Performance Monitors CPU, memory, GPU, and disk usage to ensure smooth and responsive operation
- Device Reliability Tracks hardware health, battery condition, warranty status, and system stability
- Device ResponsivenessMeasures user-facing delays like boot time, logon duration, and input lag
These foundational metrics help IT teams spot issues early, prioritize support, and optimize endpoint experience across the workforce.
Below is a structured table covering the monitored metrics and their impact. Since most metrics have configurable thresholds that the admins can set up to identify system degradation, we have also mentioned a best practice threshold for these metrics that can be leveraged by IT teams starting on their experience management journey:
| Category | Metric Monitored | Impact on Experience | Best Practice Threshold/Alert if |
|---|---|---|---|
| Application Reliability | Application Crash Events | Crashing applications interrupt work and reduce user trust in IT | All application crash events are monitored. |
| Device Performance | Free Disk Space | Low disk space causes slowness, failed updates, and app crashes | Free disk space is less than 10 GB |
| Free Disk Space (OS Drive) | OS instability and failed operations due to lack of system drive space | Free OS drive space is less than 10GB | |
| CPU Usage | High CPU leads to slow response times and unresponsive apps | CPU usage exceeds 70% for 5-10 minutes | |
| Memory Usage | High memory usage causes lags, freezes, and app crashes | Memory usage exceeds 50% for 5 minutes | |
| Memory Swap Rate | Indicates system is using disk instead of RAM, leading to performance dips | Swap rate exceeds 5000 pages for 10 minutes | |
| Memory Swap Size | Excessive swap size signals memory overuse and degraded speed | Swap size exceeds 75% for 10 minutes | |
| CPU Interrupt | High interrupts may indicate hardware faults or driver issues | Interrupts exceed 2% of CPU for 5 minutes | |
| GPU Usage | High GPU load may slow down graphics-intensive apps, video calls, or design tools | GPU usage exceeds 75% for 10 minutes | |
| Disk Queue Length | Long disk queue length causes delays in read/write operations | Average queue length exceeds 1 length for 10 minutes | |
| Device Reliability | Battery Health | Poor battery health reduces portability and increases user frustration | Battery health less than 25%-30%(approximately 70–75% wear) |
| Warranty | Out-of-warranty devices carry repair risks and cost implications | Warranty expires in 30-60 days | |
| Device Age | Older devices typically underperform newer ones and are prone to failure | Device age exceeds 3-5 years | |
| Hard Reset | Frequent hard resets may point to deeper system issues or user frustration | All hard resets are monitored Alert if > 2 hard resets within a 7-day period | |
| System Crash | System crashes result in data loss and disrupted productivity | All system crashes are monitored | |
| Device Responsiveness | Boot Time | Long boot times cause delays at the start of the workday | Boot time exceeds 60 seconds |
| Extended Logon Time | Slow logons hinder user access and readiness to work | Logon time exceeds 60 seconds | |
| Max Input Delay | High input delay leads to laggy user interactions and frustration | Input delay exceeds 500 ms for 5-10 minutes |
Premises for the best practice thresholds
- CPU, memory, disk usage thresholds near 85–90% are widely recognized in industry defaults to flag real-world performance issues without triggering noise
- Disk space warnings at < 10 GB or < 10% prevent common failure modes while still allowing operating overhead
- Memory alerts, especially available RAM under 10%, signal impending swapping and slowdowns.
- Durations matter—sustained usage over a window is much more meaningful than brief spikes.
Custom Telemetry with User-Defined Data Collectors
While built-in telemetry covers a broad spectrum of critical device signals, every organization has unique needs based on their environment, workflows, and employee tools. That’s where custom telemetry comes in.
With user-defined data collectors, IT teams can extend monitoring capabilities by defining and collecting custom metrics tailored to their business. Whether it's tracking the availability of connected devices, POS peripherals, or pulling up details of enterprise apps etc, that impacts end-user productivity, DEX Manager Plus allows IT to create lightweight data collectors using PowerShell or prebuilt templates.
- Monitor custom hardware sensors
- Query application-specific logs or counters
- Check service health for internal tools
- Track latency or responsiveness for business-critical operations
- Pull registry values, WMI data, or command output
Collected data can be fed into a detection and remediation workflow, enabling correlation with core telemetry, alerting, and automated remediation workflows. This gives IT full control over experience monitoring, ensuring no blind spots—even in complex or legacy setups.
In essence, custom collectors bridge the gap between standard metrics and your unique digital environment, helping you go beyond out-of-the-box monitoring to achieve true experience observability.
With the telemetry collected and the thresholds configured in place, the next step is to get alerted when something goes wrong and to analyze the alert to identify the root causes of a particular issue. DEX Manager Plus converts raw telemetry into alerts and insights, offering in-depth RCA.