Case Study | IT Defense for Tertiary A Hospitals: Stable Medical Services Guaranteed
I. Project Background
Against the backdrop of accelerated digital transformation in the healthcare industry, core hospital business systems such as HIS, EMR, and PACS are becoming increasingly dependent on databases. With the surge in business volume and rising operation and maintenance (O&M) costs, a certain tertiary grade A hospital was plagued by issues including the narrow coverage of its existing database monitoring system and delayed fault response. There was an urgent need for a highly efficient and unified monitoring solution. Leveraging its intelligent O&M platform, Lerwee built a comprehensive integrated database monitoring system for the hospital, successfully resolving O&M challenges and safeguarding the stable operation of medical services.
1. Customer Profile
The client in this case is a tertiary grade A general hospital located in a city in East China.
2. Pain Point Analysis
As a key regional medical service provider, the hospital relies on stable IT systems for daily diagnosis and treatment, patient information management, medical data storage, and other critical work. However, with the continuous expansion of business, the hospital faced multiple O&M challenges. On one hand, the database monitoring coverage for core business systems was limited, failing to detect potential faults in a timely manner. Any database failure would severely impact medical service efficiency and patient experience. On the other hand, O&M service costs kept rising. The decentralized management model made it difficult for O&M personnel to quickly locate the root cause of problems, leading to prolonged fault troubleshooting and recovery time, which further increased the O&M burden.
In response to the above pain points, the hospital put forward clear requirements for the monitoring platform. First, it needed to achieve full-dimensional monitoring of databases supporting key business systems, covering multi-dimensional indicators such as performance, resources, and availability, with the capability to proactively warn of potential faults. Second, it had to break down monitoring barriers across different computer rooms and platforms, forming a unified management portal to facilitate quick problem localization by O&M staff. Finally, the platform should have flexible scalability and personalized configuration capabilities to meet the needs of different O&M roles in the hospital while reducing overall O&M costs.
II. Lerwee’s Solution
(1) Building a Multi-Level Monitoring System and Creating an Integrated Solution
To meet the hospital’s requirements, the Lerwee Intelligent O&M Platform developed a complete integrated database monitoring solution from three aspects: monitoring system construction, intelligent early warning mechanism, and unified portal integration.
1. Multi-Level Database Monitoring Covering All Business Scenarios
The platform established a comprehensive monitoring system spanning the infrastructure layer, database instance layer, and business application layer.
- Infrastructure Layer: It enables real-time monitoring of server indicators such as CPU, memory, disk I/O, and file systems. It supports mainstream operating systems including Windows, Linux, and AIX, as well as domestic operating systems like Kylin and UOS, ensuring the stable operation of underlying hardware resources.
- Database Instance Layer: The platform provides in-depth monitoring for multiple types of databases such as Oracle, MySQL, SQL Server, and PostgreSQL. Taking Oracle database as an example, it supports monitoring SQL statement execution efficiency (e.g., slow SQL, high CPU-consuming SQL), transaction processing status (transaction throughput, deadlocks), and tablespace usage rate. It also conducts real-time monitoring of key components such as RAC cluster status, DG (Data Guard) status, and ASM volumes.


The platform integrates multiple authentication methods including AD and LDAP, and is equipped with standardized external integration interfaces to achieve seamless connection with mainstream monitoring and O&M platforms, fully adapting to the diverse O&M management scenarios of enterprises. In terms of permission management, the platform adopts a three-level management model of organization, role, and user. Through a refined role-based permission control system, it realizes precise management of permission allocation, providing strong support for enterprises to build a secure and controllable O&M management system.
2. Intelligent Early Warning and Fault Localization to Improve O&M Response Efficiency
To quickly identify and resolve problems, the platform is configured with a multi-level alarm threshold mechanism. For common fault scenarios such as database slow queries, lock waits, connection pool exhaustion, and insufficient tablespace, different levels of early warning rules are set. When indicators exceed the thresholds, the system automatically triggers alarms. Meanwhile, through correlation analysis technology, the platform can distinguish between database performance issues and underlying infrastructure resource bottlenecks. For example, when business response slows down, it can quickly determine whether the root cause is insufficient SQL statement optimization or server CPU resource shortage, helping O&M personnel accurately locate problems and shorten troubleshooting time.

In addition, the platform supports alarm notification suppression. When the number of alarms reaches the storm threshold, it automatically triggers circuit breaker protection to avoid overwhelming O&M personnel with massive alarm messages. It also has an alarm escalation mechanism: if an alarm is not handled within the specified time, the system will automatically escalate it to higher-level O&M personnel to ensure timely response to problems.
3. Unified Monitoring Portal Integration for Visualized Management
The Lerwee Platform builds a centralized monitoring management portal, which integrates and displays database monitoring data with infrastructure monitoring data such as servers, virtualization, and middleware. The platform provides a unified topology view, intuitively showing the correlation between various IT components to help O&M personnel grasp the overall operation status of the IT architecture. The real-time dashboard supports multiple visualized charts such as line charts, bar charts, and pie charts, dynamically displaying the changing trends of core indicators including CPU utilization, memory usage, and database connection count, making O&M data clear at a glance.

At the same time, the platform supports custom report functions. O&M personnel can generate daily, weekly, and monthly reports as needed, which include information such as alarm statistics, top N indicator rankings, and fault handling status. These reports not only provide data support for daily O&M work but also serve as a decision-making basis for IT system optimization and upgrading.
(2) Refined Implementation to Ensure Efficient Project Progress
To ensure the smooth launch and effective operation of the monitoring platform, the Lerwee team formulated a refined implementation plan and promoted project delivery in phases.
1. Scientific Design of Underlying Architecture and Rational Resource Configuration
Based on the quantity and types of monitored objects in the hospital, the design of the platform’s underlying architecture was finalized. Meanwhile, a high-availability architecture was adopted, covering components such as web servers, collection servers, agent collection servers, and database servers to ensure uninterrupted monitoring services. A compression mechanism was applied during data transmission, and sensitive information was stored in encrypted form to ensure data security. System logs support daily segmented storage, facilitating log query and archiving.

2. Phased Incorporation of Monitored Objects and Gradual Launch
After platform deployment, monitored objects were incorporated into platform management in batches. For example, during the database incorporation process, dedicated monitoring templates were configured for different types of databases such as SQL Server and Oracle to ensure that monitoring indicators are accurately adapted. For the incorporation of virtualization and operating systems, real-time monitoring of device status was achieved through multiple monitoring methods such as Agent and SNMP. After each batch of monitored objects was launched, trial operation tests were conducted to adjust monitoring thresholds and alarm rules in a timely manner, ensuring that the platform’s monitoring effect meets the hospital’s requirements.
III. Customer Benefits
The successful implementation of the Lerwee Intelligent O&M Platform has brought significant O&M benefits and business value to the tertiary grade A hospital, mainly reflected in the following three aspects:
1. Reducing O&M Costs and Improving Management Efficiency
The unified monitoring platform eliminates monitoring data silos, providing a consistent monitoring experience and management standards. O&M personnel can complete full-dimensional monitoring without switching between different systems, greatly reducing tool learning and maintenance costs. Meanwhile, the platform’s automated early warning and fault localization functions shorten the database fault detection time from hours to minutes, significantly reducing fault troubleshooting and recovery time, improving O&M response efficiency, and lowering manual O&M costs.
2. Ensuring Business Stability and Improving Medical Service Quality
Through 7×24 real-time monitoring of databases supporting core business systems such as HIS, EMR, and PACS, the platform can identify potential faults in advance and issue timely warnings, effectively preventing business interruptions caused by database problems. For example, when the monitored usage rate of a business tablespace approaches the threshold, the system automatically triggers an alarm, prompting O&M personnel to expand capacity in a timely manner. This avoids issues such as inability to save electronic medical records and patient registration failures due to insufficient tablespace, ensuring the continuity and stability of medical services and enhancing the patient experience.
3. Enhancing Data Security and Protecting Patient Privacy
In addition to monitoring database performance and operating status, the platform also conducts real-time monitoring of database access behavior and backup status. When abnormal database access or backup failures occur, the system immediately triggers an alarm, allowing O&M personnel to intervene in a timely manner. This effectively prevents security risks such as data leakage and data loss, provides strong protection for patient privacy data security, and helps the hospital achieve compliant operations.
Summary
With its comprehensive monitoring capabilities, efficient fault handling mechanism, and flexible personalized configuration, the Lerwee Intelligent O&M Platform has successfully resolved the IT O&M challenges of the hospital client. In the future, Lerwee will continue to focus on the healthcare industry, continuously optimize product functions based on the characteristics and needs of medical IT systems, provide high-quality O&M monitoring solutions for more medical institutions, and contribute to the steady progress of digital transformation in the healthcare industry.
- Case Study | O&M Empowers Bank of China’s Digital Transformation
- Case Study: Monitoring & Network Management for a Listed Electronic Circuit Substrate Enterprise
- Example of Upgrading the Operation and Maintenance Monitoring System in a Third Class Hospital
- Case Study: IT O&M Platform for a Listed Special Materials High-Tech Firm
- Case: HK Diversified Finance Firm’s Monitoring & Network Mgmt Platform
- Case: Intelligent O&M Platform Build in Major Securities Firm