Analysis of users’electricity consumption behavior based on ensemble clustering

doi:10.1016/j.gloei.2020.01.001

Figure（0）

Tables（0）

Author Information

Publication Information

Analysis of users’electricity consumption behavior based on ensemble clustering

Qi Zhao¹ ,Haolin Li² ,Xinying Wang¹ ,Tianjiao Pu¹ ,Jiye Wang¹

（ 1.China Electric Power Research Institute,Haidian District,Beijing,100192,P.R.China , 2.Northeastern University,360 Huntington Ave,Boston,MA,02115,United States of America ）

DOI:10.1016/j.gloei.2020.01.001

Keywords

Users’electricity consumption;Ensemble clustering;Dimensionality reduction;Cluster validity.

Abstract

Due to the increase in the number of smart meter devices,a power grid generates a large amount of data.Analyzing the data can help in understanding the users’electricity consumption behavior and demands; thus,enabling better service to be provided to them.Performing power load profile clustering is the basis for mining the users’electricity consumption behavior.By examining the complexity,randomness,and uncertainty of the users’electricity consumption behavior,this paper proposes an ensemble clustering method to analyze this behavior.First,principle component analysis (PCA) is used to reduce the dimensions of the data.Subsequently,the single clustering method is used,and the majority is selected for integrated clustering.As a result,the users’electricity consumption behavior is classified into different modes,and their characteristics are analyzed in detail.This paper examines the electricity power data of 19 real users in China for simulation purposes.This manuscript provides a thorough analysis along with suggestions for the users’weekly electricity consumption behavior.The results verify the effectiveness of the proposed method.

1 Introduction

With the development of digitalization in a variety of industries,power systems are continuously strengthening their digital construction.As a result,more sensors,intelligent terminals,and monitoring equipment are being used.These can produce and record data for all links of power generation,power transmission,power substation,power distribution,and power consumption [1-6].By digging deep into power data information,this information can be very valuable.Trends in power grid development include breaking the barriers between business and information for power energy systems along with expanding new modes for the power industry.

The user is the service object of a power grid.By considering the requirements of the user along with providing efficient,economical,accurate,and intelligent services,these are the main tasks the power grid faces from the consumer side.Smart meters contain a significant amount of power consumption data.Making full use of this information and analyzing the users’power consumption behavior can help us better understand load forecasting,energy efficiency management,customer classification,abnormal power consumption detection,power marketing,and other applications [7-10].At the same time,the energy consumption situation also provides the basis for the government to make macro-decisions,such as industry adjustments and economic regulations.

The users’electricity consumption behavior is affected by many factors.These include the users’social and economic situations,the external temperature,which can take on characteristics that can be described as random and complex [11-12].Therefore,choosing the appropriate method to determine the load curve characteristics,as well as discriminating the electricity consumption mode,and examining the consumers’electricity usage habits and behavior are problems that need to be solved urgently.

Presently,many clustering methods are being used to research the users’electricity consumption behavior.In particular,the fast-density peak clustering algorithm is used for clustering [13] and the factors affecting the users’consumption behavior are shown.Wang et al.[14] were able to convert the load curve into a matrix and use the density peak clustering technique to obtain typical dynamic consumption behavior.Li et al.[15] reduced the dimension of the load curve and used the AP algorithm to cluster the data.Based on the partition to process the complex data,Ran et al.[16] selected the clustering algorithm and divided the users’electricity consumption behavior into eight patterns.K-means,k-medoid,and other clustering methods have been used to determine the users’electricity consumption behavior [17].

A single clustering method is difficult to meet the requirements of clustering efficiency and precision simultaneously [18].In addition,there are some shortcomings in dealing with the randomness of the load [19-20].Ensemble clustering consists of a comprehensive integration of single clustering algorithms to improve the clustering quality.The results of ensemble clustering are less affected by noise,isolated points,sampling methods,and sensitive parameters.Ensemble clustering also has the advantages of evaluating clustering uncertainty [18],[21],[22].

In addition,the deep learning method [23-25] is also used to analyze the users’electricity consumption behavior.These methods have achieved good results; however,most of them are based on supervised learning classification and prediction.For complex power systems,these methods require a lot of effort from a real person to sample and mark the data before developing a prediction.In the emergence of new data types,the data needs to be marked and the algorithm needs to be trained again; hence,further research is required.

Combined with the above analysis,this paper proposes the ensemble clustering method to analyze the users’electricity consumption behavior.First,this study aims to reduce the dimensionality of the users’power data.This contains the maximum amount of data information that can be stored while speeding up and improving the performance of the algorithm [18].Afterwards,a variety of single clustering methods such as ‘HDBSCAN’,‘Affinity Propagation’,‘Gaussian Mixture’,‘MeanShift’,and ‘Birch’are used for a majority vote.This is done to achieve final integrated clustering,as well as classifying the users’power consumption behavior and analyzing their characteristics in detail.In addition,a simulation is performed by the proposed method.This paper uses the electricity power data from 19 users in China for the simulation.Their weekly electricity consumption behavior was also analyzed.This paper also discusses how the simulation results verify the effectiveness of the proposed method.

2 Methods

This section first discusses the dimension reduction and the clustering validity indices method for enhancing the clustering performance.This section also proposes the ensemble clustering method to process the power data for analyzing the users’electricity consumption behavior.

2.1 Dimensionality reduction algorithm

There are many algorithms that can be used to reduce the dimensions for power data.Some commonly used algorithms include:

(1) Principal component analysis

Principal component analysis (PCA) [26] is a widely used linear algorithm that is capable of reducing the dimensions of data.Its main principle is to construct a comprehensive index by delivering a linear combination of the original variables.To perform a PCA operation,the principal components of the data has to be calculated:

In the formula,X is the original data matrix,and the column vector V is the orthogonal feature vector,X.If the data,X,is reduced to d dimensions,then only the first d columns of V needs to be removed:

(2) Feature extraction

Feature extraction [27] is a way of extracting the data.The extracted features generally have some specific physical meaning.For example,the extracted features can be an average value,maximum value,minimum value,variance,along with other parameters that can be calculated from the original dataset.

(3) Sammon mapping

Sammon mapping is a nonlinear mapping method.The main purpose of this method is to minimize the objective function:

Where:d(xi,xj) is the distance between point i and point j in the original data space and d(yi,yj) is the distance between point i and point j in the mapping space.

The sammon mapping algorithm is capable of feature extraction and it can intuitively and accurately represent data features; however,it is complicated and time consuming.When considering its operation,the physical significance of its dimension reduction process will have an adverse effect on the clustering results [18].Therefore,this study used the PCA algorithm to reduce the data dimensions.

2.2 Cluster validity index

After reducing the data’s dimensions,the clustering validity index was used to evaluate the results of clustering the dimension reduction.Three effective indicators are described in this paper,which include SSE (sum of squares),DBI (Davies-Bouldin index),and the silhouette coefficient.

(1) Sum of squares

In the formula,yi is the sample point and pagenumber_ebook=8,pagenumber_book=482 is the center point of a class of samples.In addition,wi is the weight,which can take the reciprocal of a number of samples from a class of samples.In essence,SSE measures the distance from the sample point to the center of the sample group.This distance is used to measure the clustering effect.In general,as the SSE value becomes reduced,the better the clustering effect.

(2) Davies-Bouldin index

To calculate the DBI [28],the Si needs to be calculated first.This step is similar to the SSE process.As presented in (5),Si is the distance from the point of the group to the center of this group:

Afterwards,the distance between the centroid of one group to another can be calculated using (6):

Next,the similarity,Rij can be determined using (7):

Afterwards,the DBI can be calculated,which is equivalent to measuring the mean of the maximum similarity of each group.The smaller the DBI value becomes,the lower the cluster similarity of the data; thus,resulting in a better clustering effect.

(3) Silhouette coefficient

To calculate the Silhouette coefficient [29],the average distance ai for the sample i needs to be determined relative to the other samples in its group.ai is the in-cluster dissimilarity for sample i.The smaller the value of ai the further sample i should be clustered into the group.Next,the average distance bij was calculated for all of the samples belonging to i to the cluster Ci,which is dissimilar to Ci.This is defined as the dissimilarity between the clusters of sample i:bi = min(bi1,bi2,...bin).

The Silhouette coefficient s(i) can be obtained by the following rules:

It can be seen from the formula that the closer the silhouette coefficient is to 1,the higher the attribution degree of the clustering point.If the point belongs to the current group,which is closer to -1,the closer the point belongs to the other groups.

When considering the above equations,the SSE algorithm has several disadvantages.For example,when comparing the clustering results with different input dimensional data,the value of SSE is not credible.This is because the SSE is determined from the squared summation of the distances.As the dimension increases,the sum of the SSE will inevitably increase.The algorithm will simultaneously increase as the parameter dimension increases.This is unfavorable to select the appropriate reduction dimension; therefore,this study did not employ the SEE method.

For DBI,the distance between two points in a group and the distance between the group center and a group point will both change as the dimensions decrease or increase.By dividing the two distances,the influence of the dimensional change on the value of the DBI will significantly decrease.Thus,DBI is a sufficient way to measure the effectiveness of clustering the different dimensions of the data.As for the silhouette coefficient,it can be seen from the formula that since the silhouette coefficient is calculated by dividing the two distances,the silhouette coefficient is insensitive to the dimensional change of the input sample.

2.3 Ensemble clustering

Ensemble clustering is a method that can effectively improve the data clustering effect.In essence,ensemble clustering combines the results from a single clustering model while using the same set of data that gives a comprehensive result.The advantage of the ensemble cluster is that it can avoid some clustering problems that are difficult or unavoidable for a single clustering mode.This gives a more reasonable and appropriate clustering result [30].As a result,the implementation of ensemble clustering is generally divided into three steps.The first step is for data processing.The second step uses different clustering algorithms to perform operations on the same set of data to obtain the results.The third step merges the different clustering results,which includes majority voting and charting methods.There are many clustering algorithms that are used before integration methods.In this paper,the clustering algorithm that is performed before integration uses ‘HDBSCAN’,‘Affinity Propagation’,‘Gaussian Mixture’,‘Mean-Shift’,and ‘Birch’to perform the first step of the operation.This is because these clustering algorithms are insensitive to the cluster shape and more efficient,unlike the k-means clustering results that always gives a fixed shape,or yields poor efficiency in agglomeration [16,31].After that,this paper uses the majority voting method to integrate the clustering results.The majority voting method runs a different clustering algorithm and the results of all of the clusters are integrated.This generates the ensemble clustering results under a different threshold.This results in choosing a classification result that exceeds the set threshold ratio as the final prediction classification.The advantages are obvious,especially in the case where the number of single classifiers are sufficient and are independent from each other.In addition,the voting method can reduce the clustering error exponentially.

3 Simulation

In this section,a simulation was conducted on the power data.This was based on the ensemble cluster after selecting the parameters for ensemble clustering.Ensemble clustering was used to classify the power data into different modes and to analyze the characteristics for each mode.The users’electricity consumption behavior was also briefly analyzed.

3.1 Data used for the ensemble cluster

This paper retrieved the power data from 19 users.Each user acquired power data for the first 14 days (two weeks) in January.The data collection frequency was one data point for every 15 minutes; thus,each user had 96 acquisition data points per day.When processing the data,in order to facilitate the interception and observation of the sample points,this paper applied each user’s 96 power data points (one-day) as a sample point for subsequent operations.The reason for this is that the user's short-term (one-day) power consumption can be directly observed.In addition,it is also convenient to collect the multi-day electricity usage while observing the long-term power consumption of the users.In terms of the computational analysis,reducing the dimensions of the data was first conducted followed by using different clustering algorithms to perform clustering separately.Afterwards,the majority voting method was used to integrate the results.

3.2 Parameter selection

(1) Selecting the optimal clustering data dimension

The values calculated using the DBI and silhouette coefficient for the clustering results in different dimensions are presented in Fig.1.The x-axis indicates the input data dimensions,and the y-axis indicates the DBI and silhouette coefficient,respectively.

Fig.1 The (a) DBI and (b) silhouette coefficient for the clustering results over a range of dimensions

Fig.1 (a) represents from the DBI for the range of dimensions whereas Fig.1 (b) displays the silhouette coefficients for the dimensions.When comparing these figures,there is no relationship between the two clusters’validity index and the corresponding dimensional changes.When the data dimensions are low,the value of the DBI is small and the silhouette coefficient is relatively high.Since the range of the x-axis in the figure is large,the lower part of the dimension is taken out separately for comparison purposes.

Fig.2 shows the DBI and the enlarged silhouette coefficients for the lower dimension values.As demonstrated,the silhouette coefficient achieves its maximum value when the dimension is four; however,the DBI does not obtain its minimum value here.When the dimension is seven,the DBI is at a minimum; yet,the silhouette coefficient is not high.Therefore,the dimension of four was selected.

(2) Select the appropriate number of clustering categories

Fig.2 The DBI and silhouette for the lower dimension values

Some clustering algorithms,such as GMM and KMEANS,need to determine the number of sample types before clustering.To further improve the clustering effect,different input group numbers for K were explored under different data input dimensions.The clustering result is illustrated in Fig.3.The horizontal coordinate for Fig.3 is the number k for the selected input cluster categories,and the ordinate is the DBI and the silhouette coefficient,respectively.Each line represents the results of the input data from the different dimensions.The legend indicates the specific dimensions of the input data.

Fig.3 The (a) DBI and (b) silhouette scores for the dimension data

Fig.3 (a) shows the DBI results while Fig.3 (b) displays the silhouette coefficient results.It can be seen from these figures that although the silhouette coefficient has the highest value when the input data is set to five and the silhouette coefficient is a little bit less when the input data is set to four.In addition,the DBI ratio at a dimension set to four (lower blue line) is much smaller than when the dimension is set to five (the central green line).The silhouette coefficients for the different dimensions do not have many differences; however,as the dimension increases,the DBI also increases.Therefore,the input data dimension was chosen to be four,and the range of 6 to 16 was selected for the input group number K.

3.3 The result of ensemble clustering

Before using the clustering method,the input data dimension was uniformly reduced by employing the PCA to the four-dimensional data.For the clustering algorithm that needs to input the number of clustering categories before they are calculated,the input categories,K,had a range of 6 to 16.All of the results were synthesized using the majority vote.When applying the majority vote,this resulted in using different thresholds to obtain the results separately.

After reducing the input data dimensions to four dimensions,the resulting silhouette coefficients obtained by selecting the different probability vote thresholds are listed in Table 1.

Table1 The different thresholds with their corresponding silhouette coefficients

Threshold 0.3 0.5 0.6 0.7 0.9 silhouette coefficients 0.21279 0.4044 0.47297 0.4844 0.25218

According to Table 1,as the silhouette coefficient comes closer to 1,the better the clustering result.Therefore,the threshold value of 0.7 was selected as the final result.The classification result obtained by the target threshold is classified into 13 different modes.The specific classification results are as follows.For Fig.4,the horizontal coordinate point indicates that one unit occurs every 15 minutes; hence,there are 96 points in total.These points indicate the time of day and the ordinate indicates the power.The curves of the different colors represent the different sample points.In other words,these colors represent the power usage for the different days for the users.

The results show that the electricity consumption of the 19 users are divided into 13 modes.This paper analyzes and discusses the 13 modes of electricity consumption.A line graph is presented that represents the users’power consumption for one week followed by an analysis of their electricity consumption habits.

Fig.4 The results of the 13 different modes

3.4 Analysis of the ensemble cluster

In order to show more details for each mode,this paper analyzes the clustering results.This is based on the following modes:high power consumption; relatively low power consumption; low power consumption; stable electricity consumption; unstable electricity consumption; the daytime peak type; the nighttime peak type; and no obvious peak type.

(1) Determine high and low electricity consumption

When determining the high and low power consumption for each mode,this study used the sum of the average values for all of the curves in the same classification that includes the high and low power consumption results.The average power consumption for the 13 modes are listed in Table 2.

Table2 The average power consumption values of the 13 different modes

Label 1 2 3 4 5 6 Ave 7.9 37.9 33.87 63.43 62.67 65.33 Label 7 8 9 10 11 12 13 Ave 42.15 39.55 114.6 153.2 130 144.67 21.84

As demonstrated in Table 2,Labels 1,2,3,7,8,and 13 display low-power modes.In contrast,Labels 9,10,11,and 12 have higher power consumption modes.

(2) Determining if the power is stable or unstable

For this investigation,a variance was used to determine if the power is stable or unstable.If the power was unreliable,the power consumption curve features a large jump; however,during the electricity consumption period,the power was stable as demonstrated by Label 4.In some instances,there was no excessive jump and the power consumption curve is vibrating between a high and a low value at a high frequency,such as Label 3.After comparing the average variances of the curves between the different modes,the power consumption mode is smooth.However,it features large jumps during power consumption (e.g.Label 4) and will have a large variance.As a result,this mode will be considered to be operating in an unstable power consumption mode.If the power consumption was unstable,the difference between the high and low values is smaller than the jump type (e.g.Label 3).If the variance was small,this mode is considered to be stable.Since this is inconsistent based on the above discussion,this study uses the Fourier transform and the inverse Fourier transform to deal with the problem described above.

Fourier transforms are used to express the frequency characteristics of the curves or discrete data.When observing the previous cluster modes,we can eliminate the large jump data with an extremely low frequency.As a result,we can consider the power vibration data with a high frequency.Afterwards,a comparison was performed for the variance between the high-frequency switching data.This was done to determine whether the power consumption is stable for a user during a specific mode.This method can be implemented through the following five steps:

1) Find the discrete Fourier transform for the original discrete data.

2) After obtaining the spectrum information,change the low-frequency part information to a value closer to 0; thus,making it much smaller than the response for the highfrequency information.

3) Perform an inverse Fourier transform on the processed data from the previous step to obtain data,which already contains the reduced low frequency information.

4) Find the variance for the new data.

5) Compare the variances of the different modes and obtain the results.

By following these five steps,the average variance for each mode was determined as presented in Table 3.

Table3 Average variance after performing a Fourier transform for each mode

Label 1 2 3 4 5 6 7 Ave(10-6) 0.0018 1.75 1.2 5.1 4.8 4.7 3.04 Label 8 9 10 11 12 13 Ave(10-6) 2.47 12.3 10.2 62.9 35.7 2.72

From the above results,Labels 1,2,3,4,5,6,7,8,and 13 all display stable power consumption modes.Meanwhile,Labels 9,10,11,and 12 are unstable,especially 11 and 12.The above conclusions are consistent with the actual observations.

(3) Determining the daytime and nighttime peaks

When determining the peaks for the modes during the day and night,this paper extracts the average curve from all of the power curves.This is under the same classification by calculating the average for all of the power points for each classification.The classification is as follows:

As demonstrated from the 13 average curves,Labels 1,2,3,and 12 are modes that feature a day-time peak.Labels 8 and 11 are not considered to have day-time peak modes.In addition,Labels 4,5,6,7,and 13 peak during night time.Meanwhile,Labels 9 and 10 do not have obvious peaks since they consistently use power throughout the day.This information is further summarized in Table 4.

Fig.5 The average of 13 different mode result

Table4 Characteristics for each label

Number Characteristic Type 1 Low consumption,stable,day-time peak Label1 2 Relatively low consumption,stable,day-time peak Label2,Label3,Label8 3 High consumption,stable,night-time peak Label4,label5,label6 4 Relatively low consumption,stable,night-time peak Label7 5 High consumption,unstable,no peak Label9,label10 6 High consumption,unstable,day-time peak Label11,label12 7 Relatively low consumption,stable,night-time peak Label13

3.5 The classified users’electricity consumption behavior based on the consumption mode

This subsection explores each user’s one-week power usage.This analysis was performed by counting the types of electricity usage for each user every day and observing the changes in daily electricity consumption for one week.

In order to clearly observe the power consumption behavior for each user without analyzing too much data simultaneously,this study describes the user's power consumption data.The power consumption classification for each day is described in the legend; thus,making it intuitive to analyze the behavior patterns and changes to the user’s power consumption mode.The power consumption behavior for the 19 users is described as follows.In Fig.6,the x-axis indicates that one unit is 15 minutes and there are 96 unit points in total; thus,this indicates the time of day.The y-axis signifies the magnitude of the electric power.The multicolored curves represent the power consumption for each day in the week for the same user.The number corresponding to each colored curve indicates the clustering mode generated by the ensemble cluster.

As illustrated in Figure 6,the electricity consumption behavior for most users is habitual.In other words,the power consumption behavior is consistently maintained within one week.For instance,user 1 peaks during the day-time and is considered to be a relatively low power consumption user.Meanwhile,users 6 and 17 peak their consumption rates at night-time.Only a few users (e.g.Labels 9,10,and 11) will change their power consumption behavior within one week.According to these observations,we can determine the users’power consumption habits and type for a week.In addition,when considering users who are mainly night-time users,we can reduce the price of electricity to encourage the users to use more power at night,which would be better for the power grid’s health.By analyzing the users’daily electricity consumption based on the mode of classification,we can have a better understanding of the user’s electricity consumption behavior,which can provide a reliable basis for future work.

Fig.6 The one week results for some of the users

4 Conclusion

This paper briefly discusses ensemble clustering for power data while reducing the dimensions of the data.DBI and the silhouette coefficient were used to evaluate the clustering results from the different dimensions.The relationship between the data dimensions and the clustering performance were determined.Finally,a suitable data dimension was chosen to obtain the best result.Afterwards,a majority vote based on the single clustering method was applied for ensemble clustering and 13 different electricity consumption modes were classified.Simultaneously,the specific characteristics and practical significance for each mode were determined.Furthermore,the users’electricity consumption behavior in one week based on these modes were analyzed,which can provide guidance for future work.This paper proves that ensemble clustering can be successfully applied for power data analysis.In the future,more accurate results can be obtained by changing the single clustering algorithm with ensemble clustering along with using integration methods for the ensemble clustering.

Acknowledgements

This word was supported by the State Grid Science and Technology Project (No.5442AI90009) and Natural Science Foundation of China (No.6170337 ).

References

[1]
Zhang D,Qiu RC (2018) Research on big data applications in Global Energy Interconnection.Glob Energ Inter.1(3):352-357 [百度学术]
[2]
Liang X (2018) Application and research of global grid database design based on geographic information.Glob Energ Interc.1(1):87-95 [百度学术]
[3]
Chinese society for electrical engineering informatization committee,Chinese electric power big data development white paper (2013) In:Chinese society for electrical engineering,Beijing,China,2013 [百度学术]
[4]
Wang Y,Chen Q,Hong T (2019) Review of Smart Meter Data Analytics:Applications,Methodologies,and Challenges.IEEE Trans Smart Grid.10(3):3125-1763 [百度学术]
[5]
Wang J,Ji Z,Shi M et al (2015) Scenario analysis and application research on big data in smart power distribution and consumption systems.Proc.CSEE.35(8):1829-1836 [百度学术]
[6]
Zhao T,Zhang Y,Zhang D (2014) Application technology of big data in smart distribution grid and its prospect analysis.Power Syst Tech.38(12):3305-3312 [百度学术]
[7]
Lu J,Zhu Y,Peng W et al (2017) Feature Selection Strategy for Electricity Consumption Behavior Analysis in Smart Grid.Auto Elec Power Syst.41(5):58-63 [百度学术]
[8]
Tian S,Wang B,Zhang J (2014) Key technologies for demand response in smart grid.Proc CSEE.34(22):3576-3589 [百度学术]
[9]
Xin M,Zhang Y,Xie D (2019) Summary of Researches on Consumer Behavior Analysis Based on Big Power Data Elec Auto.41(1):1-4 + 27 [百度学术]
[10]
Liu D,Wang M,Jiang Y,et al (2019) Design of Rapid Demand Response Market Mechanism Based on Load Quality Grading Utilization.Glob Ener Inter 2(3):224-234 [百度学术]
[11]
Wang Y,Chen Q,Sun M et al (2018) An Ensemble Forecasting Method for the Aggregated Load With Subprofile.IEEE Trans Smart Grid,9(4):3906-3908 [百度学术]
[12]
Liu F (2017) Research on smart grid big data outlier detection and analysis of electricity behavior based on density peaks cluttering algorithm.Beijing:China Elec Power Res Inst. [百度学术]
[13]
Li S,Yuan Z,Ding J,et al (2019) Analysis of users’electricity behavior and influencing factors based on clustering.Power DSM,21(3):53-58 [百度学术]
[14]
Wang Y,Chen Q,Kang C et al (2016) Clustering of Electricity Consumption Behavior Dynamics Toward Big Data Application.IEEE Trans Smart Grid,7(5):2437-2447 [百度学术]
[15]
Li C,Cai W,Zhao R et al (2019) Customer Behavior Analysis Based on Affinity Propagation Algorithm with Optimized SAX and Weighted Load Characteristic Indices.Trans China Elec Soc.34(1):3125-3148 [百度学术]
[16]
Ran R,Liu Y,Li Z et al (2019) Research of electricity consumption pattern recognition based on cluster analysis.Power Syst Big Data.22(4):43-49 [百度学术]
[17]
McLoughlin F,Duffy A,Conlon M (2015) A clustering approach to domestic electricity load profile characterization using smart metering data.Appl Energ 141:190-199 [百度学术]
[18]
Zhang B,Zhuang C,Hu J et al (2019) Ensemble Clustering Algorithm Combined with Dimension Reduction Techniques for Power Load Profiles.Proc CSEE.35(15):3741-3749 [百度学术]
[19]
Darshan S,Soham B (2014) Comparison of Basic Clustering Algorithms.Intl J Comp Sci Mobil Comp,3(10):58-61 [百度学术]
[20]
Steinbach M,Karypis G,Kumar V (2000) A Comparison of Document Clustering Techniques.Paper presented at the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data mining,Boston,MA,USA,20-23 [百度学术]
[21]
Tahani A,Wenjia W (2015)¸ Clustering ensemble method,Machine Learning and Cybernetics.Intl J Mach Learn Cybernet 10(6):1227-1246 [百度学术]
[22]
Zhan JM ,Chen JT,Xing JQ (2017) Research advance of clustering ensemble algorithm.Paper presented at the Proceedings of the 2017 International Conference on Wavelet Analysis and Pattern Recognition,Ningbo,China,9-12 [百度学术]
[23]
Wang Y,Chen Q,Gan D,et al (2019) Deep Learning-Based Socio-Demographic Information Identification from Smart Meter Data.IEEE Trans Smart Grid,10(3):2593-2602 [百度学术]
[24]
Wang K,Lu J,Xu Z,et al (2019) Classification Method of Unbalanced Power Consumption Data Based on Prior Knowledge and Deep Boltzmann Machine Sampling.Auto Elec Power Syst 43(20):57-64 [百度学术]
[25]
Zhu Z,Sun Y,Ji Y (2007) Short-term Load Forecasting Based on EMD and SVM,High Volt Eng 5 [百度学术]
[26]
Wold S,Esbensen K (1987) Principal component analysis.Chemo.Intell Lab Syst 2(1):37-52 [百度学术]
[27]
Khalid S,Khalil T,Nasreen S (2014) A survey of feature selection and feature extraction techniques in machine learning.Paper presented at the Science and Information Conference,London,UK,27-29 [百度学术]
[28]
Davies DL,Bouldin D (1979) A Cluster Separation Measure.IEEE Trans Patt Anal Mach Intell PAMI-1(2):224-227 [百度学术]
[29]
Rousseeuw PJ (1987) Silhouettes:A Graphical Aid to the Interpretation and Validation of Cluster Analysis.Comp Appl Math 20:53-65 [百度学术]
[30]
Wua X,Ma T,Cao J et al (2018) A comparative study of clustering ensemble algorithms.Comp.Elec.Eng.68:603-615 [百度学术]
[31]
Athman B,Yu Q,Liu X et al (2015) Efficient agglomerative hierarchical clustering.Exp Syst Appl 42(5):2785-2797 [百度学术]

Fund Information

supported by the State Grid Science and Technology Project (No.5442AI90009)； Natural Science Foundation of China (No. 6170337)；

Author

Qi Zhao

Qi Zhao receive her master’s degree at the University of Tokyo.She is working in China Electric Power Research Institute.Her main fields of interest are the application of Artificial Intelligence technology in Power system.
Haolin Li

Haolin Li receive his bachelor’s degree at Beijing Institute of Technology.He is a second year Graduate student in Northeastern University now.His main fields of interest are Machine Learning,deep learning and data analyzing.
Xinying Wang

Xinying Wang receive his Ph.D.degree at Dalian University of Technology.He is working in China Electric Power Research Institute.His main fields of interest are electric artificial intelligence.
Tianjiao Pu

Tianjiao Pu,professorate senior engineer,is the director of Artificial Intelligence Application Research Department of China Electric Power Research Institute.His main fields of interest are power dispatching automation,smart grid simulation,active distribution network,artificial intelligence and other fields.
Jiye Wang

Jiye Wang,professorate senior engineer,doctoral supervisor,is the Director,General Manager (Executive Vice President) of China Electric Power Research Institute.His main fields of interest are electric power information communication,artificial intelligence and energy internet.

Publish Info

Received：2019-08-31

Accepted：2019-10-09

Pubulished：2019-12-25

Reference： Qi Zhao,Haolin Li,Xinying Wang,et al.(2019) Analysis of users’electricity consumption behavior based on ensemble clustering.Global Energy Interconnection,2(6):479-488.

Contents

Figure（0）

Tables（0）

Recommended articles：

Global Energy Interconnection

Analysis of users’electricity consumption behavior based on ensemble clustering

Keywords

Abstract

1 Introduction

2 Methods

2.1 Dimensionality reduction algorithm

2.2 Cluster validity index

2.3 Ensemble clustering

3 Simulation

3.1 Data used for the ensemble cluster

3.2 Parameter selection

3.3 The result of ensemble clustering

3.4 Analysis of the ensemble cluster

3.5 The classified users’electricity consumption behavior based on the consumption mode

4 Conclusion

References

Fund Information

Author

Qi Zhao

Haolin Li

Xinying Wang

Tianjiao Pu

Jiye Wang

Publish Info