The ever-rising quantity of collected data of water conservancy project together with the development of cloud computing, big data and Internet of Things poses higher demands for the storage and processing of massive, multi-source and heterogeneous data that traditional theories and methods could not meet. In this research, a big data platform for grouting data of water conservancy project is designed based on B/S service mode for display, operation, and management. The functional modules of the platform mainly include data resource downloading, data set uploading and running, customized algorithms, as well as visualization of running status and results and big data.Moreover, the platform was applied for demonstration with Baihetan water conservancy project as a case study. A model for predicting the grouting injection amount per unit based on random forest together with a model of anomaly detection of grouting result based on K-Means clustering was built.By integrating structural and unstructured data and by adopting Hadoop distributed cluster and parallelized data mining algorithm, the platform could achieve integrated sharing of data resource, effective processing, knowledge discovery of data information, and improves the efficiency and accuracy of data storage and processing. This research offers a new thinking for the big data storage and computing of water conservancy project
Key words
big data platform /
water conservancy project /
grouting /
Hadoop /
Spark /
random forest /
K-Means
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
References
[1] 郭晓刚.基于LIBSVM的高面板坝趾板基础灌浆智能预测[J] .人民长江,2011,42(1):33-36.
[2] The Apache Software Foundation.Welcome to Apache Hadoop[EB/OL].(2017-12-01)[2017-12-22]. http://hadoop.apache.org/.
[3] The Apache Software Foundation. Spark Overview[EB/OL].(2017-12-01)[2017-12-22].http://spark.apache.org/docs/latest/.
[4] 程志华,倪时龙,黄文思,等.企业级非结构化数据管理平台研究及实践[J].电力信息化,2012,10(3):12-20.
[5] 杨东华,李宁宁,王宏志,等.基于任务合并的并行大数据清洗过程优化[J] .计算机学报,2016,39(1):97-107.
[6] 韦泽鲲,夏靖波,张晓燕,等.基于随机森林的流量多特征提取与分类研究[J].传感器与微系统,2016,35(12):55-59.
[7] 刘琪琛,雷景生,郝珈玮,等.基于Spark平台和并行随机森林回归算法的短期电力负荷预测[J] .电力建设,2017,38(10):84-92.