基于CEEMDAN和IMSA的混合模型在水质预测中的应用

doi:10.11988/ckyyb.20240254

摘要/Abstract

摘要：

水质预测是水污染防治的重要组成部分,但水质序列呈现出较强的随机性、不平稳性等特点,为进一步提高地表水质预测的精度,提出一种新型水质预测混合模型。首先采用自适应噪声完备集合经验模态分解(CEEMDAN)将原始水质序列分解,然后利用模糊散布熵(FuzzDE)将分量划分为高、中、低3种复杂度成分,其次分别利用改进螳螂算法(IMSA)优化后的双向长短时记忆网络(BiLSTM)、最小二乘支持向量机回归(LSSVR)、极限学习机(ELM)对高、中、低3种复杂度成分进行预测,并对预测结果进行组合重构,最后建立BiLSTM误差校正模型对误差进行修正,得到最终预测结果。利用沅江支流酉水两个断面的溶解氧浓度及湘江流域一个断面的pH值进行仿真验证,R²可达90%以上,结果表明混合模型预测的准确性优于其他对比预测模型。

关键词: 水质预测, CEEMDAN分解, 模糊散布熵, 螳螂算法, 混合模型

Abstract:

[Objectives] To enhance water quality prediction accuracy, this study aims to address the following challenges: (1) traditional prediction methods often rely on simple, elementary decomposition techniques, limiting their ability to extract meaningful data features. (2) Single models and basic optimization algorithms result in low prediction accuracy. (3) Most approaches fail to leverage the advantages of different networks to analyze components of varying complexity, leading to inefficient model utilization. (4) Few studies incorporate error correction after prediction. This study proposes a novel hybrid model for water quality prediction. [Methods] First, the original water quality sequence was decomposed using Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN). Next, Fuzzy Dispersion Entropy (FuzzDE) categorized the components into high-, medium-, and low-complexity subsequences. Then, an Improved Mantis Search Algorithm (IMSA) optimized three distinct models: Bidirectional Long Short-Term Memory (BiLSTM) for high-complexity components, Least Squares Support Vector Regression (LSSVR) for medium-complexity components, and Extreme Learning Machine (ELM) for low-complexity components. The predictions were combined and reconstructed, and a BiLSTM-based error correction model further corrected the errors, yielding the final prediction results. [Results] The study introduced four key innovations to the original Mantis Search Algorithm (MSA): (1) combining Logistic-Tent chaotic mapping for population initialization, ensuring uniform and random distribution of initial solutions to enhance global search capability and convergence speed; (2) nonlinear acceleration factor, refining MSA’s core update formula to transition from global exploration to local exploitation, mitigating local optima entrapment; (3) elite-guided adaptive update strategy, addressing the excessive randomness in the position update strategy when Mantis attacks fail, improving late-stage search efficiency while preserving some randomness; (4) opposition-based learning, generating individuals opposite to the current individual to enhance global optimization. IMSA’s performance was validated using benchmark functions (Rosenbrock for unimodal, Michalewicz for multimodal), confirming improved global search and convergence precision. After determining the network hyperparameters, ablation experiments were conducted to analyze the contribution of each strategy to the network model, providing a clear understanding of how each strategy impacts prediction performance. Finally, the sequence of model usage was validated by using FuzzDE to calculate the complexity of each component, creating high-, medium-, and low-complexity subsequences. The learning capabilities of different networks for these subsequences were verified, with BiLSTM used to predict high-complexity components, LSSVR for medium-complexity components, and ELM for low-complexity components. [Conclusions] This study performed a simulation verification using dissolved oxygen (DO) concentrations from two sections of Youshui River (a tributary of the Yuanjiang River) and pH values from one station in the Xiangjiang River Basin. Missing values were addressed via linear interpolation. For outlier treatment, the study considered that outliers in the data might be caused by sudden pollution events and discontinuous non-point source pollution. Directly removing them could lead to information loss, so outliers were retained. After integrating decomposition, use of entropy, algorithm optimization, and error correction models, eleven comparative experiments were established to evaluate the effectiveness of each optimization method. The hybrid model’s effectiveness was validated using RMSE, R², and MAPE metrics. Ultimately, the R² reached over 90%, demonstrating that the prediction accuracy of the hybrid model outperformed other comparative models.

Key words: water quality prediction, CEEMDAN decomposition, fuzzy dispersion entropy, Mantis Search Algorithm, hybrid model

中图分类号:

郭利进, 吴昊天. 基于CEEMDAN和IMSA的混合模型在水质预测中的应用[J]. raybet体育在线院报, 2025, 42(6): 60-70.

GUO Li-jin, WU Hao-tian. Application of a Hybrid Model Based on CEEMDAN and IMSA in Water Quality Prediction[J]. Journal of Changjiang River Scientific Research Institute, 2025, 42(6): 60-70.

图/表 15

图1 MSA优化流程

Fig.1 MSA optimization flowchart

图2 混沌映射分岔、频数及分布

Fig.2 Bifurcation, frequency, and distribution of chaotic mapping

图3 加速度因子变化

Fig.3 Variation of acceleration factors

图4 不同基准函数的适应度收敛曲线

Fig.4 Rosenbrock and Michalwicz fitness curves

图5 混合模型预测流程

Fig.5 Hybrid model prediction flowchart

图6 CEEMDAN分解的各分量

Fig.6 Components of CEEMDAN decomposition

图7 各分量模糊分散熵值分布

Fig.7 Fuzzy dispersion entropy value distribution of each component

表1 不同模型需优化的超参数

Table 1 Hyperparameters to be optimized for different models

网络模型	超参数	超参数范围
BiLSTM	神经元个数n、窗口长度m、学习率r	n∈ $1,100$ ,m∈ $2,15$ , r∈[0.001,0.1]
LSSVR	正则化参数C、RBF ?撕牟问齡amma	C∈[0.1,100] gamma∈[0.01,10]
ELM	权值ω、阈值b	$ω ∈ - 1,1, b ∈ [0,1]$

表1 不同模型需优化的超参数

Table 1 Hyperparameters to be optimized for different models

网络模型	超参数	超参数范围
BiLSTM	神经元个数n、窗口长度m、学习率r	n∈ $1,100$ ,m∈ $2,15$ , r∈[0.001,0.1]
LSSVR	正则化参数C、RBF ?撕牟问齡amma	C∈[0.1,100] gamma∈[0.01,10]
ELM	权值ω、阈值b	$ω ∈ - 1,1, b ∈ [0,1]$

表2 不同模型的算法消融试验R2

Table 2 R2 values of algorithmic ablation tests for different models

模型	BiLSTM	LSSVR	ELM
MSA	0.705	0.732	0.721
MSA1	0.742	0.768	0.739
MSA2	0.733	0.751	0.731
MSA3	0.728	0.740	0.727
MSA4	0.738	0.759	0.734
IMSA	0.743	0.770	0.739

表3 不同模型预测分量的误差值

Table 3 Error values of components predicted by different models

分量	BiLSTM	LSSVR	ELM
IMF1	0.321	0.357	0.381
IMF2	0.091	0.117	0.239
IMF3	0.028	0.047	0.079
IMF4	0.024	0.020	0.025
IMF5	0.022	0.020	0.028
IMF6	0.019	0.018	0.018
IMF7	0.022	0.015	0.008
IMF8	0.016	0.021	0.008
IMF9	0.012	0.015	0.001

图8 算法改进前后各分量预测效果

Fig.8 Prediction performance of components before and after algorithm improvement

图9 混合模型预测结果

Fig.9 Prediction results of hybrid model

表4 江口不同模型预测误差

Table 4 Prediction errors of different models for Jiangkou

模型	模型代号	RMSE	R²	MAPE
BiLSTM	M1	0.410 9	0.618	3.041
LSSVR	M2	0.390 6	0.655	2.725
ELM	M3	0.404 2	0.630	2.841
CEEMDAN-BiLSTM	M4	0.282 2	0.820	2.309
CEEMDAN-LSSVR	M5	0.280 5	0.822	2.365
CEEMDAN-ELM	M6	0.264 6	0.842	2.197
CEEMDAN-FuzzDE-BiLSTM-LSSVR-ELM	M7	0.242 6	0.867	1.846
CEEMDAN-FuzzDE-MSABiLSTM-MSALSSVR-MSAELM	M8	0.232 9	0.877	1.781
CEEMDAN-FuzzDE-IMSABiLSTM-IMSALSSVR-IMSAELM	M9	0.213 7	0.898	1.775
CEEMDAN-FuzzDE-IMSABiLSTM-IMSALSSVR-IMSAELM-EC	M10	0.219 5	0.891	1.697
CEEMDAN-FuzzDE-IMSABiLSTM-IMSALSSVR-IMSAELM-CEEMDAN-EC	M11	0.195 9	0.913	1.672

表5 凤滩水库不同模型预测误差

Table 5 Prediction errors of different models for Fengtan Reservoir

模型	模型代号	RMSE	R²	MAPE
BiLSTM	M1	0.475 8	0.787	3.466
LSSVR	M2	0.623 7	0.634	4.046
ELM	M3	0.612 3	0.647	3.992
CEEMDAN-BiLSTM	M4	0.372 3	0.870	2.955
CEEMDAN-LSSVR	M5	0.383 0	0.862	2.817
CEEMDAN-ELM	M6	0.379 1	0.865	2.927
CEEMDAN-FuzzDE-BiLSTM-LSSVR-ELM	M7	0.306 9	0.911	2.289
CEEMDAN-FuzzDE-MSABiLSTM-MSALSSVR-MSAELM	M8	0.285 0	0.923	2.059
CEEMDAN-FuzzDE-IMSABiLSTM-IMSALSSVR-IMSAELM	M9	0.262 6	0.935	2.005
CEEMDAN-FuzzDE-IMSABiLSTM-IMSALSSVR-IMSAELM-EC	M10	0.254 9	0.939	1.868
CEEMDAN-FuzzDE-IMSABiLSTM-IMSALSSVR-IMSAELM-CEEMDAN-EC	M11	0.232 7	0.949	1.772

表6 绿埠头不同模型预测误差

Table 6 Prediction errors of different models for Lühutou

模型	模型代号	RMSE	R²	MAPE
BiLSTM	M1	0.223 8	0.666	1.923
LSSVR	M2	0.222 5	0.670	2.310
ELM	M3	0.217 3	0.685	2.060
CEEMDAN-BiLSTM	M4	0.166 3	0.815	1.527
CEEMDAN-LSSVR	M5	0.175 1	0.795	1.736
CEEMDAN-ELM	M6	0.148 9	0.853	1.523
CEEMDAN-FuzzDE-BiLSTM-LSSVR-ELM	M7	0.137 1	0.875	1.323
CEEMDAN-FuzzDE-MSABiLSTM-MSALSSVR-MSAELM	M8	0.128 1	0.891	1.160
CEEMDAN-FuzzDE-IMSABiLSTM-IMSALSSVR-IMSAELM	M9	0.120 8	0.903	1.023
CEEMDAN-FuzzDE-IMSABiLSTM-IMSALSSVR-IMSAELM-EC	M10	0.121 6	0.902	0.947
CEEMDAN-FuzzDE-IMSABiLSTM-IMSALSSVR-IMSAELM-CEEMDAN-EC	M11	0.110 6	0.918	0.907

[1]	黄学平, 辛攀, 吴永明, 吴留兴, 邓觅, 姚忠. 融合残差与VMD-TCN-BiLSTM混合网络的鄱阳湖总氮预测[J]. raybet体育在线院报, 2025, 42(3): 59-67.
[2]	兰小机, 贺永兰, 武帅文. 基于RF-BiLSTM模型的河流水质预测[J]. raybet体育在线院报, 2024, 41(7): 57-63.
[3]	郭利进, 许瑞伟. 基于改进果蝇算法的LSTM在水质预测中的应用[J]. raybet体育在线院报, 2023, 40(8): 57-63.
[4]	王渤权, 金传鑫, 周论, 沈笛, 蒋志强. 基于长短期记忆网络的西丽水库水质预测[J]. raybet体育在线院报, 2023, 40(6): 64-70.
[5]	张庭瑜, 毛忠安, 孙增慧. 基于径向基神经网络耦合确定性指数的滑坡易发性分区研究[J]. raybet体育在线院报, 2021, 38(11): 64-72.
[6]	罗学科, 何云霄, 刘鹏, 李文. ARIMA-SVR组合方法在水质预测中的应用[J]. raybet体育在线院报, 2020, 37(10): 21-27.
[7]	王嵛, 黄耀英, 刘钰, 肖磊, 袁斌. 大型渡槽实测钢筋应力定量与可靠性分析[J]. raybet体育在线院报, 2019, 36(11): 21-26.
[8]	王建, 王甜, 徐文鹏, 彭泽豹. 利用多测点混合模型对混凝土坝受冻区坝体弹性模量的反演[J]. raybet体育在线院报, 2018, 35(7): 136-140.
[9]	周彦辰, 胡铁松, 陈进, 许继军, 周研来. 耦合动态方程的神经网络模型在水质预测中的应用[J]. raybet体育在线院报, 2017, 34(9): 1-5.
[10]	刘俊威, 吕惠进. 人工神经网络在水质预测中的应用研究[J]. raybet体育在线院报, 2012, 29(9): 95-97.
[11]	张舒羽 , 程杭平. 杭州市新取水口水质论证[J]. raybet体育在线院报, 2010, 27(6): 14-17.
[12]	李民, 杨定华, 危文爽. 坝体变形一维多测点分布混合模型中温度分量的确定[J]. raybet体育在线院报, 1998, 15(2): 31-33.