P2O | Md Nazmul Kabir Sikder

Water Systems Time Series Dashboard LSTM

P₂O: Prediction Dashboard

P₂O (Prediction to Optimization) is an intelligent AI-driven dashboard solution for water utilities, featuring advanced Multivariate Multi-step LSTM models for accurate tunnel water level prediction and real-time system optimization.

   Core Capabilities  Multi-step Forecasting: 2-6 hour predictions with high accuracy
 Real-time Dashboard: Live monitoring and decision support
 Overflow Prevention: Early warning system for critical events
 

96.2%

Prediction Accuracy

6hrs

Forecast Horizon

367K

Training Samples

42

Sensor Features

Comprehensive Methodology

Data Preprocessing

Advanced preprocessing pipeline with PCA, downsampling, and missing value handling for robust model training.

Exploratory Analysis

Comprehensive EDA revealing temporal patterns, seasonal trends, and critical overflow indicators.

Model Development

LSTM architecture with Huber loss optimization for handling extreme weather events and anomalies.

Data Preprocessing Pipeline

Comprehensive data preprocessing methodology for tunnel wastewater level prediction with advanced feature engineering.

Multi-Stage Preprocessing

The preprocessing pipeline handles real-world water distribution system data with 243 initial columns from multiple sensor networks, applying sophisticated techniques to ensure data quality and model performance.

Processing Stages

Missing Value Treatment: Intelligent handling of NA sensor readings
Feature Selection: Reduction to 42 relevant columns
PCA Analysis: Principal component analysis for dimensionality reduction
Temporal Downsampling: 30-minute interval optimization

Data Versions

Two optimized datasets are created:

PCA Version: Uncorrelated features for improved model stability
Raw Downsampled: Original features with reduced temporal resolution

Preprocessing Results

Initial Features: 243

Final Features: 42

Data Points: 367,943

Sampling Rate: 30 min

Figure 1: Schematic diagram of the comprehensive methodology used for tunnel water level prediction, showing the five-component pipeline from data preprocessing to model evaluation.

Reference: Kulkarni et al., 2023

Multivariate Multi-step LSTM Model

Advanced LSTM architecture specifically designed for multi-step water level forecasting with overflow event prioritization.

LSTM Architecture

The Multivariate Multi-step LSTM model represents a sophisticated approach to time-series forecasting, designed to handle the complex temporal dependencies in water level data while prioritizing critical overflow incidents.

Model Features

Multiple Input Features: Incorporates diverse sensor measurements
Multi-step Output: Predicts 2-6 hour future water levels
Overflow Prioritization: Special handling for critical events
Weather Adaptability: Robust performance during extreme conditions

        <h5><i class="fas fa-calculator"></i> Huber Loss Function</h5>
        <p>The model uses Huber loss to handle anomalies from extreme weather and overflow events:</p>

\[L(y, f(x)) = \begin{cases} \frac{1}{2}(y - f(x))^2 & \text{if } |y - f(x)| \leq \delta \\ \delta \left(|y - f(x)| - \frac{1}{2}\delta\right) & \text{if } |y - f(x)| > \delta \end{cases}\]

        <p>This robust loss function ensures reliable predictions during both normal operations and extreme weather conditions.</p>
      </div>
    </div>
  </div>
  <div class="col-lg-4">
    <div class="results-summary">
      <h4>Model Configuration</h4>
      <div class="metric-item">
        <span class="metric-label">Sequence Length:</span>
        <span class="metric-value">24-72h</span>
      </div>
      <div class="metric-item">
        <span class="metric-label">Forecast Steps:</span>
        <span class="metric-value">2-6h</span>
      </div>
      <div class="metric-item">
        <span class="metric-label">Optimizer:</span>
        <span class="metric-value">Adam</span>
      </div>
      <div class="metric-item">
        <span class="metric-label">Test Split:</span>
        <span class="metric-value">30%</span>
      </div>
    </div>
  </div>
</div>

<div class="visualization-container">

  <div class="caption-detailed">
    <p><strong>Figure 2:</strong> Detailed water level forecasting LSTM architecture showing the multivariate input processing, temporal sequence modeling, and multi-step output generation for overflow prediction.</p>
  </div>
</div>

Performance Analysis

Model Performance

RMSE: 0.089 (normalized)
MAE: 0.067 (normalized)
Nash-Sutcliffe Efficiency: 0.94
RSR Ratio: 0.25

Overflow Detection

Overflow Prediction: 98.3% accuracy
Early Warning: 4-6 hour lead time
False Alarms: <5% rate
Critical Events: 100% detection

Real-world Impact

Operational Security

Provides water utilities with reliable early warning systems for overflow prevention, ensuring environmental protection and regulatory compliance.

Operational Efficiency

Enables proactive maintenance scheduling and resource allocation through accurate multi-step forecasting capabilities.

\] </code> </pre>

After training, the model outputs are transformed back to the original scale for comparison with actual water levels. The key focus is on detecting peak water levels, especially during overflow conditions, defined as water levels exceeding -47 feet.

Model Optimization and Explainability

The LSTM model is optimized using the Huber loss function, which is a robust alternative to Mean Squared Error (MSE) and less sensitive to anomalies. The Huber loss function adjusts dynamically between MSE and Mean Absolute Error (MAE), providing a balance between robustness and smooth optimization.

$$ L(y, f(x)) = \begin{cases} \frac{1}{2}(y - f(x))^2 & \text{if } |y - f(x)| \leq \delta \\ \delta \left(|y - f(x)| - \frac{1}{2}\delta\right) & \text{if } |y - f(x)| > \delta \end{cases} $$ </pre>

To improve the interpretability of the deep learning model, SHAP (Shapley Additive Explanations) values are used to estimate feature importance. SHAP provides insights into the contribution of each feature to the model's output by applying a game-theory-based approach to calculate contribution scores for each prediction. By analyzing SHAP values, operators can gain a better understanding of critical features influencing tunnel water levels, optimizing the operational decision-making process.

Algorithm: Multivariate Multistep LSTM with Huber Loss for Tunnel Water Level Forecasting and Anomaly Detection

Algorithm: Multivariate Multistep LSTM with Huber Loss for Tunnel Water Level Forecasting with Anomaly Detection

Input: Multivariate time series data $X$, LSTM model parameters, prediction horizon $H$, anomaly threshold $T$

Output: Forecasted water levels and anomaly labels

Split $X$ into training ($X_{\text{train}}$) and testing ($X_{\text{test}}$) datasets

The use of SHAP values helps to explain the model's predictions by highlighting the contribution of different features. For example, this approach enables water utility operators to make informed decisions on when to activate pumps to prevent tunnel overflow based on critical feature importance.

P2O: Prediction Dashboard

Core Capabilities