Recommended articles:
-
Global Energy Interconnection
Volume 3, Issue 3, Jun 2020, Pages 283-291
Analysis of the trend of global power sources based on comment emotion mining
Abstract
In recent years,renewable energy technologies have been developed vigorously,and related supporting policies have been issued.The developmental trend of different energy sources directly affects the future developmental pattern of the energy and power industry.Energy trend research can be quantified through data statistics and model calculations; however,parameter settings and optimization are difficult,and the analysis results sometimes do not reflect objective reality.This paper proposes an energy and power information analysis method based on emotion mining.This method collects energy commentary news and literature reports from many authoritative media around the world and builds a convolutional neural network model and a text analysis model for topic classification and positive/negative emotion evaluation,which helps obtain text evaluation matrixes for all collected texts.Finally,a long-short-term memory model algorithm is employed to predict the future development prospects and market trends for various types of energy based on the analyzed emotions in different time spans.Experimental results indicate that energy trend analysis based on this method is consistent with the real scenario,has good applicability,and can provide a useful reference for the development of energy and power resources and of other industry areas as well.
1 Introduction
On September 26,2015,China’s President Xi Jinping proposed exploring the construction of a global energy interconnection (GEI) and meeting the global demand for electricity in a clean and green manner at the UN Development Summit [1].The GEI is a major initiative for promoting a clean energy revolution and coping with climate change on a global scale; this has been widely commended and actively responded to globally.
It is important to collect and analyze the latest information and data to gauge the development trends of related fields of GEI construction dynamically and objectively and provide scientific decision support for strategy formulations and business executions.To study the developmental status and trends of different types of power sources around the world,relevant researchers and scholars collect unstructured data and information about policies,technologies,and industries related to energy and electric power in a timely manner using information systems.Based on the collected data and information,the global energy development trend can be tracked in an efficient manner; this can help assist with related quantitative analysis,which will enable planning,design,and ad-hoc studies in a more scientific,objective,and viable manner.However,unstructured data and information originate from several related websites and global media,and their fragmentation,variety veracity [2–4],redundancy,and format-differences make it very difficult to analyze and extract the required conclusions.Thus,using natural language processing technologies to analyze large-scale energy-related news,articles,and reports is critical to understanding the current global energy status quo and future development trends accurately.
Some scholars have conducted research on news information extraction,including the extraction of news event elements based on Chinese named entity recognition and referential digestion [5],the extraction of single document abstracts based on chapter primary and secondary relationships [6],and the extraction of news keywords based on PageRank [7].Although the current research solves the problem of extracting key information from articles and can help readers read news quickly,it does not handle the task of extracting new topic news in multiple texts well.In this study,thousands of news and reports related to energy and power collected automatically by the operation analysis information platform (built by GEIDCO) on a daily basis from various global sources are employed as targeted source data.This paper reports a holistic and credible methodology for the analysis of the current scenario and future trends of global energy development; the proposed method involves constructing multiple in-depth neural networks that comprise content classification and emotion mining models and devising a creative future trend mining algorithm.A deep convolutional neural network (CNN) is used to classify news and reports from different power sources,and another emotion classification CNN is constructed for the emotional tendencies of each piece of news and report.Thereafter,a long short-term memory (LSTM) model algorithm is used to mine the emotional tendencies of each energy source and predict its future development trend.With the aforementioned methodology,the status quo and development trend of “clean development” under the GEI initiative can be analyzed and predicted more scientifically and objectively.
2 Current research review
Currently,methods for performing emotion analysis and mining involve the use of various natural language processing technologies and models.To analyze the current situation and the future trends of global energy development accurately and objectively using different energy sources,this paper reviews the research status quo of content analysis and emotion mining technology.
2.1 Content analysis technology
(1) Keyword matching algorithm [1]
The keyword matching algorithm constructs a vocabulary of category keywords and then calculates the frequency of appearance for each keyword and each category in the target text.Finally,the algorithm ranks the scores among categories to determine the final content category.
(2) Knowledge engineering method [2]
The knowledge engineering method requires professionals to define a large number of reasoning rules for and between categories in advance.The final category is decided based on the frequency of the reasoning rules satisfied by the document; the text features used can also be employed as the matching degree of specific rules.
(3) Statistical learning method [3]
A statistical learning method first manually constructs learning and training sets for a batch of classifiers.The computer statistically analyzes the classification features and the rules between documents using the classification algorithm and then constructs a document classification model.After being processed by the same feature extraction and quantification models,the newly input documents are pumped into the document classifier,and the system provides the final document classification results based on statistical probability.Owing to its good mathematical explanation and generality,this method is used as the primary method for document classification.
2.2 Emotion analysis technology
2.2.1 Emotion dictionary [4]
This method first constructs an emotion dictionary,and each emotion word in the dictionary is assigned a positive or negative polarity; the better dictionaries also have emotion intensity annotation.The process of emotion analysis is similar to the keyword matching algorithm.
2.2.2 Machine learning [5]
Emotion classification technology based on machine learning is similar to statistical learning in content classification,with a switch from document content classification to emotion classification.Through feature extraction and rule discovery on the training data,emotion analysis models can be constructed to realize the emotion analysis of the targeted text.
2.2.3 Deep learning [6]
The deep learning text emotion analysis method aims to construct an emotion classifier with a deep network model based on the text features of the machine learning classification.Compared with a traditional machine learning method,deep learning methods require a larger number of training samples that can automatically learn rich and effective features from a different length of complex text to obtain better results.
2.3 Convolutional neural network
A CNN [7]is a neural network with a deep structure that contains convolution operations.The CNN model was first used in the field of computer vision,and since then,it has been successfully applied to the field of computer vision; moreover,researchers have introduced it into the field of NLP.Since its development,CNN has been widely used in various academic and industrial NLP scenarios such as text classification [8],machine translation [9],and recommendation system [10].
Compared with other traditional machine learning methods such as support vector machine [11]and conditional random field [12],CNN demonstrates a stronger learning ability and higher prediction accuracy,in addition to overcoming the disadvantages of traditional machine learning methods such as data sparseness,data dimension explosion,and poor data generalization.
2.4 Long-short-term memory network
The LSTM network [13]is a time recurrent neural network that is a variant of the traditional recurrent neural network (RNN) [14].Currently,LSTM and its variants have achieved the most advanced results in various NLP tasks,especially in sequence processing tasks such as sentiment analysis [15],trend prediction [16],and entity recognition [17].
Compared to other algorithms,the RNN and LSTM network show advantages of global processing and memory unit retention of the information; therefore,they can process the time series data better.Compared with traditional RNN,the LSTM network has the advantage of learning long-term dependent information and avoiding the disappearance of an RNN gradient.
3 Public opinion analysis and prediction model for global energy development
The public opinion analysis and prediction model for energy-related news is divided into classification and prediction.First,the news and information related to energy are collected using the operation analysis information platform and used as the research sample.Further,CNN models are used to classify the content and emotions,which helps identify the power sources and emotional tendency of each sample.The content and emotion classification results are input into the LSTM model to form emotion fitting curves for each power source in the time series for the past and the future,which can help predict the future trend.
3.1 Analysis process
Assume that in a certain period T,there is an energy review document set (DOC) with E emotional descriptions of S topics.
where and ti represent a smaller time segment in the historical period T; e.g.,“day,” “month,” and “year” with a total of m time segments.
Among the emotional descriptions,indicates that there are k evaluation emotions of power sources such as “positive,” “negative,” and “neutral.”
where indicates that the existing power source topics can be divided into u categories such as “hydropower,” “thermopower,” and “wind power.”
where indicates that there are n power source review articles to be processed.Considering that a certain article may discuss one single power source or a mix of multiple sources,this paper considers the dominant classification result involved.The overall process is shown in Fig.1.
Fig.1 Schematic of energy trend analysis process
3.2 Text classification algorithm based on CNN
Although the final output labels of energy topic classification and comment content emotion classification are different,only text classification technologies—the CNN-based Chinese text classification algorithm framework adopted in this paper [18]— are used (Fig.2).
Fig.2 Schematic of CNN-based Chinese text classification algorithm
3.2.1 Preprocessing
A Chinese word tokenizer toolkit was used to segment Chinese energy text and then delete the stop words to reduce the effect of noise words on the classification.For other languages,different toolkits were used.
3.2.2 Embedding layer
Let xi represent the ith word in the word sequence and xi:j represent the i to j words in the word sequence.Then,for the Chinese word sequence x1:n with a length of n after preprocessing,Word2vec is used to directly train the word vectors with dimension k.Then,all word vectors are spliced to obtain the word vector matrix with a dimension of n×k as the input of the convolution layer.
3.2.3 Convolution layer
A convolution kernel with a width equal to the word vector dimension k and height equal to h is used to convolute with a word window xi:i+h-1 containing h words,and then,the feature ci is extracted by performing a onedimensional vector filter w with a length h×k on the word window
where f is the activation function,and b is the deviation value.An n+h-1-dimensional feature map c is obtained as
3.2.4 Pooling layer
The maximum value cˆ of the feature maps extracted by each filter is obtained using the maximum pooling operation as follows:
For m filters,a vector z with length m is obtained by one layer of convolution and another layer of pooling as
3.2.5 Fully connected layer
The final feature extraction vector y is obtained by feeding the vector z into the full connection layer as follows:
where r is the Dropout parameter to prevent overfitting.When classifying the energy review content,Softmax is used as the activation function.
When classifying the energy comment emotions,the sigmoid function is used as the activation function.
3.3 Trend prediction algorithm based on LSTM
3.3.1 Model architecture
For an LSTM model (Fig.3) [19],the input sequence p=(p1,p2,…,pn),the corresponding hidden layer sequence (H1,H2,…,Hn),and the output sequence q=(q1,q2,…,qn).The calculation formula is given by
where pt is the network input,ft is the forget gate,it is the input gate,ot is the output gate,ct is the cell state,W is the weight matrix,and b is the deviation value.
Fig.3 Long short-term memory recurrent unit diagram
3.3.2 Training phase
The network is trained to predict the energy band.First,the original time series is divided into a training set and test set,and then,is obtained.Second,the normalized elements ti of the training set are obtained as
where μt is the mean value,and is the variance.The calculation method is given by
When the time step is L,if the network input p=(p1, p2,…,pL),the theoretical output q=(q1, q2,…,qL),p after passing through the hidden layer,and the output g=(g1, g2,…,gL).Then,the loss function of the training process is
3.3.3 Testing phase
During trend prediction,the inputin the first step of the test set is standardized and input into the trained LSTM.The result is expressed as and the prediction value of m+L is gv+L+1.By analogy,the predicted time series is given by
After getting gtest,we normalize it and obtain the corresponding prediction time series gtest' as
4 Experimental results
4.1 Experimental process
This study used the CNN-LSTM model to analyze and predict the public opinion of energy-related news.The news information regarding energy collected by the daily operation analysis system was preprocessed and input into the CNN model for text classification to obtain the energy types and emotional tendencies in the news information.Then,the two were combined with the time extracted from the news information.As a three-dimensional feature input to the LSTM model for trend prediction,the future public opinion trend of energy is predicted based on the time series fitting energy-sentiment curve.
4.2 Evaluating indicator
The current proportion of all emotions in the total amount are used as the basis for analyzing the current scenario of the global energy.The fitting curves of the proportion for all emotions in the total amount and the differences in positive/negative emotions in the total amount over time are used as the basis for predicting the future energy development trend.
4.3 Statistical analysis
The international news,reports,and comments regarding energy,collected by an information system,are analyzed and mined (as summarized in Table1).The annual review volume shows an increasing trend,with an exponential growth in 2013 and 2014,which was caused by an energy crisis at that time,thereby drawing considerably higher international attention.
Thermal power,as a stable energy source,has several different articles and is currently a dominant topic of research.Further,solar energy and tidal energy—as representatives of new energy—have also gained considerable research attention with the development of technology and the promotion of pilot applications; further,the number of comments related to these sources have also increased gradually.
Table1 Annual review statistics of energy in various fields
Type 2013 2014 2015 2016 2017 2018 Total Thermal power 1852 3050 3387 4180 3786 3288 25705 hydro power 770 1260 1571 1676 1619 1443 10643 Wind power 386 362 391 483 462 527 3261 Tidal energy 5 2 4 3 18 14 56 Solar energy 412 345 348 423 507 576 3028 nuclear energy 715 855 1034 1422 1392 1288 9090 Hybrid energy 372 357 345 603 684 684 3546 Total 4512 6231 7080 8790 8468 7820
4.4 Prediction of development trend for all power sources
The time and emotion distributions of each energy report and news for different power sources are shown in Fig.4.The overall time distributions of news and report for each power source show a trend of first increasing and then flattening,whereas the emotion distributions show a trend of decreasing in the order of positive,neutral,and negative values.This indicates that the development of the global energy industry has experienced a change from low attention,to a substantial increase in attention,to a recent increase in attention.Further,this trend indicates that,with the rapid growth in the global economy,people become increasing aware of the importance of the energy problem,and owing to the aggravation of global warming and environmental degradation,people have started to pay close attention to the related energy problems.Although the energy problem is more serious at present,people's attitude towards it is more positive than negative,and development of methods for mitigation of climate change and optimization of the energy industry are expected.
As shown in Fig.5,the number of comments on news and reports regarding global renewable and nuclear power is gradually increasing; however,its overall share in the total number of comments remains relatively stable.Before the breakthrough of new energy technology,the attitude of the society towards a global power source is stable,and the global energy industry also develops steadily.
For the time distribution and emotion distribution of each power source (Fig.4),thermal power is the most commented on in total,and tidal energy is the least commented on.The highest proportion of positive emotional comments is received for wind power,whereas the lowest is received for nuclear power.The highest proportion of negative emotion comments is for nuclear power,whereas the lowest is for tidal energy.These results show that the global energy industry structure remains dominated by conventional energy sources such as thermal power generation; renewable energy sources such as wind and solar energy are being scaled up; and emerging energy technologies such as tidal energy are yet to be verified and their future development prospects are unclear.With the rapid growth of renewable energy such as wind power generation,people are generally optimistic about its prospect.Owing to the effect of the Fukushima nuclear leak,people’s doubts about nuclear power have increased significantly; however,overall,there is a trend of support over the opposition.
Fig.4 Time and emotion distribution of news and reports by power sources
Fig.5 Changes in the proportion of positive,neutral,negative emotions and the positive-negative emotion differences of news and reports by power sources
With regard to the development trend for each power source over time (Fig.5),wind energy sees the fastest growing trend in terms of positive emotion proportion,whereas the slowest (fastest declining) is for tidal energy.Nuclear energy sees the fastest growing trend in terms of negative emotion proportion,whereas the slowest is with solar energy.Further,nuclear energy also enjoys the fastest growing proportion of the positive–negative emotion difference,whereas the slowest is for solar energy.Thus,in the near future,if there is no major scientific innovation or technological revolution,wind power,solar power,and other new energy sources that are currently being widely promoted will maintain their rapid development growth and gradually replace fossil energy.Although the potential of tidal energy is high,the future development prospect is not optimistic owing to the limited improvement in terms of cost-effectiveness.Further,based on the impact of the Fukushima nuclear leak,the opposition rate for nuclear energy will continue to increase in the future,and therefore,the development of nuclear energy is also concerning.
5 Conclusion
In this paper,we proposed an analytical model for the energy and power development trend based on global news,report,and comments regarding energy.The proposed model employs a CNN neural network to realize the energy topic classification and emotional tendency recognition of the comment text,and it uses an LSTM sequence mining network to realize the prediction of the public opinion tendency of the global power development in the future.The experimental results show that the algorithm can evaluate the trend of public opinion of power sources well,has strong subject analysis and sequence mining prediction ability,and can be extended to other public opinion analysis and prediction fields.Through the analysis of the public opinion regarding the current energy development by the algorithm,the current global energy presents a pattern that is dominated by conventional energy such as thermal power generation and supplemented by wind and solar energy.The development of nuclear energy is controversial,and that of tidal energy is not concerned.
In the future,renewable energy led by wind power will gradually replace traditional energy and become a mainstream trend.The development of nuclear energy is controversial,and it may be difficult to gain ground in the future.Unfortunately,tidal energy does not seem to attract sufficient attention and the investments are insufficient,which means the future development potential is not optimistic.In particular,the conclusions mentioned above can be updated,verified,and optimized as more timely news and reports are gathered and analyzed.
In the future,we plan to optimize the design and implementation of the algorithm.By analyzing multitheme and multisequence emotions of the accumulated news,reports,and comments,we will perform a more comprehensive public opinion research on power trends; We expect that this research will help us grasp and study the global energy development trends in time,thereby enabling us to assist with the quantitative analysis of energy development to provide the theoretical guidance for building a global energy system.
Acknowledgements
This article is funded by the technical project of Global Energy Internet Group Co.,Ltd.: Research on Global Energy Internet Big Data Collection and Analysis Modeling and the National Key Research and Development Plan of China under Grant (2018YFB0905000).
References
-
[1]
Wang Yimin.Global Energy Internet Concept and Prospects [J].China Electric Power 2016.49 (3): 1 [百度学术]
-
[2]
Meng Xiaofeng,Ci Yan.Big Data Management: Concepts,Technology and Challenges Research and Development of Computers 2013,50(1):146-169 [百度学术]
-
[3]
Franks B,Huang Hai,Che Zangyang,etc.Driving Big Data [M].2013 [百度学术]
-
[4]
Yan Xiaofeng,Zhang Dexin.Big Data Research[J].Computer Technology and Development,2013,4(32): 4 [百度学术]
-
[5]
Yu Jinzhong,Yang Xianfeng,Chen Yan,et al.Method of extracting news event elements based on mixed model [J].Computer System Application,2018,27 (12): 169-174 [百度学术]
-
[6]
Zhang Ying,Wang Zhongqing,Wang Hongling.Research on a single document extractive abstract method based on the primary and secondary relationship of the text [J].Journal of Chinese Information Processing,2019,33(8): 67-76 [百度学术]
-
[7]
Gu Yiran,Xu Mengxin.News keyword extraction algorithm based on PageRank [J].Journal of University of Electronic Science and Technology of China,2017,46(5): 777-783 [百度学术]
-
[8]
Aas K,Eikvil L (1999) Text categorisation: A survey [百度学术]
-
[9]
Kerremans K,Temmerman R,Zhao G (2005) Terminology and knowledge engineering in fraud detection.In: Proceedings of the International Conference on Terminology and Knowledge Engineering,8 [百度学术]
-
[10]
Su J,Zhang BF,Xu X (2006) The research on progress of Text classification technology based on machine learning.Software 17:1848-1859 [百度学术]
-
[11]
Rao Y,Lei J,Wenyin L,Li Q,Chen M (2014) Building emotional dictionary for sentiment analysis of online news World Wide Web 17(4):723-742 [百度学术]
-
[12]
Sharma A,Dey S (2012) A comparative study of feature selection and machine learning techniques for sentiment analysis.In: Proceedings of the 2012 ACM Research in Applied Computation Symposium 2012.pp.1-7 [百度学术]
-
[13]
Zhang L,Wang S,Liu B (2018) Deep learning for sentiment analysis: A survey.Wiley Interdisciplinary Rev: Data Mining Knowl Discovery 8(4):e1253 [百度学术]
-
[14]
Gao M,Li T,Huang P (2018) Text classification research based on improved Word2vec and CNN.In: International Conference on Service-Oriented Computing,Springer,Cham,pp.126-135 [百度学术]
-
[15]
Cheng H,Yang X,Li Z,Xiao Y,Lin Y (2019) Interpretable text classification using CNN and max-pooling.arXiv preprint arXiv:1910.11236 [百度学术]
-
[16]
Tang G,Müller M,Rios A,Sennrich R (2018) Why selfattention? a targeted evaluation of neural machine translation architectures.arXiv preprint arXiv:1808.08946 [百度学术]
-
[17]
Du Z,Tang J,Ding Y (2018) Polar: Attention-based cnn for oneshot personalized article recommendation.In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases,Springer,Cham,pp.675-690 [百度学术]
-
[18]
Goudjil M,Koudil M,Bedda M,Ghoggali N (2018) A novel active learning method using SVM for text classification.Int J Automat Comp 15(3):290-298 [百度学术]
-
[19]
Bin H,Yi G (2019) Character-based CRF for medical entity recognition.Intell Comp Appl [百度学术]
-
[20]
Niu D,Xia Z,Liu Y,Cai T,Liu T,Zhan Y (2018) ALSTM: adaptive LSTM for durative sequential data.In2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI),IEEE,pp.151-157 [百度学术]
-
[21]
Qin K,Li C,Pavlu V,Aslam JA (2019) Adapting RNN sequence prediction model to multi-label set prediction.arXiv preprint arXiv:1904.05829 [百度学术]
-
[22]
Long F,Zhou K,Ou W (2019) Sentiment analysis of text based on bidirectional LSTM with multi-head attention.IEEE Access 7:141960-141969 [百度学术]
-
[23]
Guo J,Xie Z,Qin Y,Jia L,Wang Y (2019) Short-term abnormal passenger flow prediction based on the fusion of SVR and LSTM.IEEE Access 7:42946-42955 [百度学术]
-
[24]
Zhang Y,Yang J (2018) Chinese ner using lattice lstm.arXiv preprint arXiv:1805.02023 [百度学术]
-
[25]
Kim Y (2014) Convolutional neural networks for sentence classification.arXiv preprint arXiv:1408.5882 [百度学术]
-
[26]
Hochreiter S,Schmidhuber J (1997) Long short-term memory.Neural Comp 9(8):1735-1780 [百度学术]
Fund Information
funded by the technical project of Global Energy Internet Group Co., Ltd.: Research on Global Energy Internet Big Data Collection and Analysis Modeling and the National Key Research and Development Plan of China under Grant (2018YFB0905000);
funded by the technical project of Global Energy Internet Group Co., Ltd.: Research on Global Energy Internet Big Data Collection and Analysis Modeling and the National Key Research and Development Plan of China under Grant (2018YFB0905000);