P2O

AI-driven dashboard for water utilities with Multivariate Multi-step LSTM prediction

Water Systems Time Series Dashboard LSTM

P2O: Prediction Dashboard

P2O (Prediction to Optimization) is an intelligent AI-driven dashboard solution for water utilities, featuring advanced Multivariate Multi-step LSTM models for accurate tunnel water level prediction and real-time system optimization.

Core Capabilities

  • Multi-step Forecasting: 2-6 hour predictions with high accuracy
  • Real-time Dashboard: Live monitoring and decision support
  • Overflow Prevention: Early warning system for critical events

96.2%

Prediction Accuracy

6hrs

Forecast Horizon

367K

Training Samples

42

Sensor Features

Comprehensive Methodology

Data Preprocessing

Advanced preprocessing pipeline with PCA, downsampling, and missing value handling for robust model training.

Exploratory Analysis

Comprehensive EDA revealing temporal patterns, seasonal trends, and critical overflow indicators.

Model Development

LSTM architecture with Huber loss optimization for handling extreme weather events and anomalies.

Data Preprocessing Pipeline

Comprehensive data preprocessing methodology for tunnel wastewater level prediction with advanced feature engineering.

Multi-Stage Preprocessing

The preprocessing pipeline handles real-world water distribution system data with 243 initial columns from multiple sensor networks, applying sophisticated techniques to ensure data quality and model performance.

Processing Stages
  • Missing Value Treatment: Intelligent handling of NA sensor readings
  • Feature Selection: Reduction to 42 relevant columns
  • PCA Analysis: Principal component analysis for dimensionality reduction
  • Temporal Downsampling: 30-minute interval optimization
Data Versions

Two optimized datasets are created:

  • PCA Version: Uncorrelated features for improved model stability
  • Raw Downsampled: Original features with reduced temporal resolution

Preprocessing Results

Initial Features: 243
Final Features: 42
Data Points: 367,943
Sampling Rate: 30 min

Figure 1: Schematic diagram of the comprehensive methodology used for tunnel water level prediction, showing the five-component pipeline from data preprocessing to model evaluation.

Reference: Kulkarni et al., 2023

Multivariate Multi-step LSTM Model

Advanced LSTM architecture specifically designed for multi-step water level forecasting with overflow event prioritization.

LSTM Architecture

The Multivariate Multi-step LSTM model represents a sophisticated approach to time-series forecasting, designed to handle the complex temporal dependencies in water level data while prioritizing critical overflow incidents.

Model Features
  • Multiple Input Features: Incorporates diverse sensor measurements
  • Multi-step Output: Predicts 2-6 hour future water levels
  • Overflow Prioritization: Special handling for critical events
  • Weather Adaptability: Robust performance during extreme conditions
        <h5><i class="fas fa-calculator"></i> Huber Loss Function</h5>
        <p>The model uses Huber loss to handle anomalies from extreme weather and overflow events:</p>
\[L(y, f(x)) = \begin{cases} \frac{1}{2}(y - f(x))^2 & \text{if } |y - f(x)| \leq \delta \\ \delta \left(|y - f(x)| - \frac{1}{2}\delta\right) & \text{if } |y - f(x)| > \delta \end{cases}\]
        <p>This robust loss function ensures reliable predictions during both normal operations and extreme weather conditions.</p>
      </div>
    </div>
  </div>
  <div class="col-lg-4">
    <div class="results-summary">
      <h4>Model Configuration</h4>
      <div class="metric-item">
        <span class="metric-label">Sequence Length:</span>
        <span class="metric-value">24-72h</span>
      </div>
      <div class="metric-item">
        <span class="metric-label">Forecast Steps:</span>
        <span class="metric-value">2-6h</span>
      </div>
      <div class="metric-item">
        <span class="metric-label">Optimizer:</span>
        <span class="metric-value">Adam</span>
      </div>
      <div class="metric-item">
        <span class="metric-label">Test Split:</span>
        <span class="metric-value">30%</span>
      </div>
    </div>
  </div>
</div>

<div class="visualization-container">
  <div class="caption-detailed">
    <p><strong>Figure 2:</strong> Detailed water level forecasting LSTM architecture showing the multivariate input processing, temporal sequence modeling, and multi-step output generation for overflow prediction.</p>
  </div>
</div>

Real-world Impact

Operational Security

Provides water utilities with reliable early warning systems for overflow prevention, ensuring environmental protection and regulatory compliance.

Operational Efficiency

Enables proactive maintenance scheduling and resource allocation through accurate multi-step forecasting capabilities.

\] </code> </pre>

After training, the model outputs are transformed back to the original scale for comparison with actual water levels. The key focus is on detecting peak water levels, especially during overflow conditions, defined as water levels exceeding -47 feet.

Model Optimization and Explainability

The LSTM model is optimized using the Huber loss function, which is a robust alternative to Mean Squared Error (MSE) and less sensitive to anomalies. The Huber loss function adjusts dynamically between MSE and Mean Absolute Error (MAE), providing a balance between robustness and smooth optimization.

$$ L(y, f(x)) = \begin{cases} \frac{1}{2}(y - f(x))^2 & \text{if } |y - f(x)| \leq \delta \\ \delta \left(|y - f(x)| - \frac{1}{2}\delta\right) & \text{if } |y - f(x)| > \delta \end{cases} $$ </pre>

To improve the interpretability of the deep learning model, SHAP (Shapley Additive Explanations) values are used to estimate feature importance. SHAP provides insights into the contribution of each feature to the model's output by applying a game-theory-based approach to calculate contribution scores for each prediction. By analyzing SHAP values, operators can gain a better understanding of critical features influencing tunnel water levels, optimizing the operational decision-making process.

Algorithm: Multivariate Multistep LSTM with Huber Loss for Tunnel Water Level Forecasting and Anomaly Detection

Algorithm: Multivariate Multistep LSTM with Huber Loss for Tunnel Water Level Forecasting with Anomaly Detection

Input: Multivariate time series data $X$, LSTM model parameters, prediction horizon $H$, anomaly threshold $T$

Output: Forecasted water levels and anomaly labels

  1. Split $X$ into training ($X_{\text{train}}$) and testing ($X_{\text{test}}$) datasets
  2. \STATE Train the LSTM model on the training dataset using Huber loss \STATE Initialize model parameters and hyperparameters: $\Theta = \{\theta_1, \theta_2, \ldots, \theta_n\}$ \STATE \textbf{Training Phase:} \FOR{$i$ in $\text{number of epochs}$} \FOR{$j$ in $\text{mini-batches}$ of training data} \STATE Forward pass: Encode and decode the data, $\hat{X} = \text{LSTM}(\text{Encode}(X_{\text{batch}}, \Theta), \Theta)$ \STATE Calculate Huber loss for $H$-step ahead predictions: $L_{\text{Huber}} = \text{HuberLoss}(X_{\text{batch}}, \hat{X}, \delta)$ \STATE Backpropagation: Update model weights to minimize Huber loss, $\Theta \leftarrow \Theta - \eta \nabla L_{\text{Huber}}$ \ENDFOR \ENDFOR \STATE \textbf{Testing Phase:} \FOR{$k$ in $\text{mini-batches}$ of testing data} \STATE Forward pass: Encode and decode the data, $\hat{X} = \text{LSTM}(\text{Encode}(X_{\text{batch}}, \Theta), \Theta)$ \STATE Calculate Huber loss for $H$-step ahead predictions for each sample: $L_{\text{sample}} = \text{HuberLoss}(X_{\text{batch}}, \hat{X}, \delta)$ \IF{$L_{\text{sample}} > T$} \STATE Mark as an anomaly \ELSE \STATE Mark as normal \ENDIF \ENDFOR \RETURN Forecasted water levels and anomaly labels \end{algorithmic} \end{minipage} } \end{algorithm} \end{code> </pre>

    The use of SHAP values helps to explain the model's predictions by highlighting the contribution of different features. For example, this approach enables water utility operators to make informed decisions on when to activate pumps to prevent tunnel overflow based on critical feature importance.

References