Windfarm Processing

Processing of the windfarm database consists of filtering the data, adding powercurve, and adding missing data

Load Windturbine Data

filename = 'TheWindPowerDatabaseWithRotorParameters25Mar.csv';
opts = detectImportOptions(['WT_data\' filename],'Encoding','UTF-8');
t=readtable(['WT_data\' filename],opts);

Basic Filtering

Remove offshore turbine
t(~strcmp(t.Offshore_Shore_distance,'No'),:)=[];
Remove turbine not in production (i.e., Construction, Planned, Approved or dismantled)
t(~strcmp(t.Status,'Production'),:)=[];
Remove data without location information
t(isnan(t.Latitude) | isnan(t.Longitude),:)=[];
Remove data outside of study area as defined by the bird datase. First, we assign to each windfarm the grid id of its location in the bird database and then we filter for windfarm further than 50km from the closest grid id.
load('../2018/data/Density_estimationMap','g');
[dist,grid_id] = min(pdist2([g.lat2D(g.latlonmask) g.lon2D(g.latlonmask)],[t.Latitude t.Longitude],@lldistkm));
t.grid_id = grid_id';
t(dist>50,:)=[];
Display table
t
t = 12518×34 table
 IDContinentISO_codeCountryState_codeAreaCityNameX2nd_nameLatitudeLongitudeAltitude_DepthLocation_accuracyOffshore_Shore_distanceManufacturerTurbineHub_heightNumber_of_turbinesTotal_powerDeveloperOperatorOwnerCommissioning_dateStatusDecommissioning_dateLinkUpdateturbine_maketurbine_capacityturbine_rotorDiameter
11256'Europe''AT''Austria''NA''Niederösterreich''Ertl''Ertl''NA'47.967014.6600'NA''Yes''No''Enercon''E40/500'651500'NA''Prenn Franz''NA'2000'Production''''Link'29-Apr-0019 00:00:00'Enercon'50040.3000
21247'Europe''AT''Austria''NA''Oberösterreich''Eberschwang''Eberschwang''NA'48.145013.5190'620''Yes''No''Enercon''E40/500'5021000'Windkraft Innviertel GmbH/EWS Consulting''Windkraft Innviertel GmbH''NA'NaN'Production''''Link'29-Apr-0019 00:00:00'Enercon'50040.3000
31252'Europe''AT''Austria''NA''Oberösterreich''Laussa''Laussa''Plattenberg'47.952014.4920'NA''Yes''No''Tacke''TW-600'5031800'EWS Consulting''Energie AG''Erneuerbare Energie Laussa GmbH'1996'Production''''Link'29-Apr-0019 00:00:00'Tacke'60043.0000
416707'Europe''AT''Austria''NA''Oberösterreich''Munderfing''Munderfing''Kobernaußerwald'48.055013.2420'NA''Yes''No''Vestas''V112/3000'140515000'NA''NA''Municipality of Munderfing/EWS Consulting'NaN'Production''''Link'29-Apr-0019 00:00:00'Vestas'3000112.0000
51251'Europe''AT''Austria''NA''Oberösterreich''Oberrödham''Oberrödham''NA'48.330013.6620'NA''Yes''No''Vestas''V47/660'6531980'EWS Consulting''Energie AG''NA'1999'Production''''Link'29-Apr-0019 00:00:00'Vestas'66047.0000
61254'Europe''AT''Austria''NA''Oberösterreich''Schenkenfelden''Schenkenfelden''NA'48.510014.3500'NA''Yes''No''Micon''M1800-600/48'6021200'EWS Consulting''Future Energy GmbH''WEB Windenergie AG'1998'Production''''Link'20-Feb-0020 00:00:00'Micon'60048.0000
71249'Europe''AT''Austria''NA''Oberösterreich''Schernham''Schernham''NA'48.179013.6090'NA''Yes''No''Enercon''E66/1800'8611800'Windkraft Innviertel GmbH''Windkraft Innviertel GmbH''NA'NaN'Production''''Link'29-Apr-0019 00:00:00'Enercon'180070.0000
81257'Europe''AT''Austria''NA''Oberösterreich''Spörbichl''Spörbichl''NA'48.552014.6000'NA''Yes''No''Vestas''V47/660'6521320'EWS Consulting''New Energy/Austrian Wind Power GmbH''NA'1999'Production''''Link'29-Apr-0019 00:00:00'Vestas'66047.0000
91248'Europe''AT''Austria''NA''Oberösterreich''Steiglberg''Steiglberg''NA'48.096013.3560'NA''Yes''No''Vestas''V80/2000'10012000'EWS Consulting''NA''Windkraft Simonsfeld GmbH & Co KG'2002'Production''''Link'16-Sep-0019 00:00:00'Vestas'200080.0000
101250'Europe''AT''Austria''NA''Oberösterreich''Steindlberg''Steindlberg''NA'48.322013.6660'NA''Yes''No''Vestas''V47/660'6521320'EWS Consulting''Energy from A to Z Gmbh & Co KGa''NA'2001'Production''''Link'29-Apr-0019 00:00:00'Vestas'66047.0000

Load Power Curve data

filename = 'Power_curves_20210319.xls';
opts = detectImportOptions(['WT_data\' filename],'Sheet','Power_curves');
tp=readtable(['WT_data\' filename],opts);
powerCurve.ws = table2array(tp(1,6:end-2));
powerCurve.ManufucturerName = tp.ManufucturerName(2:end);
powerCurve.TurbineName = tp.TurbineName(2:end);
powerCurve.Power = table2array(tp(2:end,6:end-2));

Smoothing Powercurve

0.2+0.1*w (Jourdier, 2020 ; 10.5194/asr-17-63-2020): 30 min
0.6 + 0.2*w (Staffell & Pfenninger 2016 ; 10.1016/j.energy.2016.08.068) : 1hr MERRA
1.0 (Olauson & Bergkvist, 2015) 1hr MERRA
1.5 (Olauson, 2018)
powerCurve.ws_smooth=powerCurve.ws(1):0.1:powerCurve.ws(end);
powerCurve.Power_smooth = nan(size(powerCurve.Power,1),numel(powerCurve.ws_smooth));
for i_p=1:size(powerCurve.Power,1)
p = interp1(powerCurve.ws,powerCurve.Power(i_p,:),powerCurve.ws_smooth);
% figure; plot(powerCurve.ws,powerCurve.Power(i_p,:),'or',powerCurve.ws_smooth,p,'-k')
for i_w=1:numel(powerCurve.ws_smooth)
s = 0.6+0.2*powerCurve.ws_smooth(i_w);
weight = normpdf(powerCurve.ws_smooth,powerCurve.ws_smooth(i_w),s);
weight = weight ./sum(weight);
powerCurve.Power_smooth(i_p,i_w)=weight*p';
end
% figure; hold on; plot(powerCurve.ws,powerCurve.Power(i_p,:)); plot(powerCurve.ws_smooth,powerCurve.Power_smooth(i_p,:));
end

Add Powercurve to windturbine

[~,Locb] = ismember(string([t.Turbine t.Manufacturer]) , string([powerCurve.TurbineName powerCurve.ManufucturerName]),'rows');
% t2=t(Locb==0,:);
t.powerCurve = nan(size(t,1),numel(powerCurve.ws_smooth));
t.powerCurve(Locb~=0,:) = powerCurve.Power_smooth(Locb(Locb~=0),:);
It would be nicer to do use the join function, but thes function requires that all windfarms have a matching powercurve. Hence, we use ismemeber()
%t2=join(t,tp,'LeftKeys','ID','RightKeys','Turb_ID');

Assessed missing data

a=sprintf([num2str(size(t,1)) '\t windfarms\n' ]);
a=[a sprintf([num2str(sum(isnan(t.Number_of_turbines))) '\t windfarms without number of windturbine\n' ])];
a=[a sprintf([num2str(sum(isnan(t.turbine_sweptArea))) '\t windfarms without swept area\n' ])];
a=[a sprintf([num2str(sum(isnan(t.Hub_height))) '\t windfarms without hub height\n' ])];
a=[a sprintf([num2str(sum(all(isnan(t.powerCurve),2))) '\t windfarms without power curve\n' ])];
disp(a)
12518 windfarms
154 windfarms without number of windturbine
166 windfarms without swept area
166 windfarms without hub height
3259 windfarms without power curve
figure('position',[0 0 1000 1000]);
subplot(6,1,1); histogram(t.Number_of_turbines); xlabel('Number of turbines'); ylim([0.5 10000]);set(gca, 'YScale', 'log')
subplot(6,1,2); histogram(t.turbine_sweptArea); xlabel('Swept Area'); ylim([0.5 10000]); set(gca, 'YScale', 'log')
subplot(6,1,3); histogram(t.Hub_height); xlabel('Hub height'); axis tight;set(gca, 'YScale', 'log')
subplot(6,1,4); histogram(t.turbine_capacity); xlabel('Turbine Capacity'); axis tight;set(gca, 'YScale', 'log')
subplot(6,1,[5 6]); hold on;
plot(powerCurve.ws_smooth, nanmean(t.powerCurve/1000),'k');
plot(powerCurve.ws_smooth, nanmedian(t.powerCurve/1000),'-.k');
plot(powerCurve.ws_smooth, quantile(t.powerCurve/1000,0.1),'--k');
plot(powerCurve.ws_smooth, quantile(t.powerCurve/1000,.9),'--k');
ax=gca; ylabel('Power MW'); xlabel('Windspeed')

Fill gaps

Several method exist for missing data: median, GLM or kNN (or more sofisticated like MICE)
First, we explore relationships between the variable of interest to see which variable can be used, and what kind of relationship there are among them
var_test = {'turbine_sweptArea','turbine_capacity','Hub_height','Number_of_turbines','Total_power'};
figure('position',[0 0 1000 1000]);
for i1=1:numel(var_test)
for i2=(i1+1):numel(var_test)
subplot(numel(var_test),numel(var_test),i2+(i1-1)*numel(var_test)); hold on
plot(t.(var_test{i2}),t.(var_test{i1}),'.k');
if i2==i1+1, xlabel(strrep(var_test{i2},'_',' ')); ylabel(strrep(var_test{i1},'_',' ')); end
l=lsline; l.Color='r';
box on;
end
end
Based on this plot, we can see that turbine_sweptArea, turbine_capacity and Hub_height are linearly linked while Total_power and number of turbines are also linked
We then use GLMs to predict the data and created a table with imputated value tim
tim=t;
Check occurane of nan for the three variable
X=[tim.turbine_sweptArea tim.turbine_capacity tim.Hub_height];
[a1,~,c] = unique(isnan(X),'rows');
[a1 histcounts(c)']
ans = 3×4
0 0 0 12350
0 1 0 2
1 1 1 166
Hub height can be determined from swept area and turbine capacity for 4K cases, the other variable can only be estimated in 1 case... (so not useful)
We build a glm with all the data and predict only for nan value in Hub_height:
mdl = fitglm([tim.turbine_sweptArea tim.turbine_capacity],tim.Hub_height);
[ypred,yci] = predict(mdl,[tim.turbine_sweptArea(isnan(tim.Hub_height),:) tim.turbine_capacity(isnan(tim.Hub_height),:)]);
ypred(ypred>max(tim.Hub_height))=max(tim.Hub_height); % Avoid missleading value of hubheight
tim.Hub_height(isnan(tim.Hub_height)) = ypred;
We performe the same analysis for Number_of_turbines and Total_power
Check occurane of nan
X=[tim.Number_of_turbines tim.Total_power];
[a1,~,c] = unique(isnan(X),'rows');
[a1 histcounts(c)']
ans = 4×3
0 0 12350
0 1 14
1 0 153
1 1 1
We performe the prediction for both variable as above
mdl = fitglm(tim.Number_of_turbines, tim.Total_power);
mdl2 = fitglm(tim.Total_power,tim.Number_of_turbines);
[ypred,yci] = predict(mdl,tim.Number_of_turbines(isnan(tim.Total_power),:));
tim.Total_power(isnan(tim.Total_power)) = ypred;
[ypred,yci] = predict(mdl2,tim.Total_power(isnan(tim.Number_of_turbines),:));
tim.Number_of_turbines(isnan(tim.Number_of_turbines)) = ypred;
We can then assess the missing value in the new imputated table
a=sprintf([num2str(size(tim,1)) '\t windfarms\n' ]);
for i=1:numel(var_test)
a=[a sprintf([num2str(sum(isnan(tim.(var_test{i})))) '\t windfarms without ' var_test{i} '\n' ])];
end
disp(a)
12518 windfarms
166 windfarms without turbine_sweptArea
168 windfarms without turbine_capacity
166 windfarms without Hub_height
1 windfarms without Number_of_turbines
1 windfarms without Total_power
The remaining missing data can be filled with the median value.
% tim=t; % in case GLM is not used above
for i=1:numel(var_test)
tim.(var_test{i})(isnan(tim.(var_test{i})))=nanmedian(tim.(var_test{i}));
end
Fill also powercurve with median. Not used anymore, see below
% tmp = all(isnan(tim.powerCurve),2);
% tim.powerCurve(tmp,:) = repmat(nanmedian(tim.powerCurve),sum(tmp),1);
var_test = {'turbine_sweptArea','turbine_capacity','Hub_height','Number_of_turbines','Total_power'};
figure('position',[0 0 1000 1000]);
for i1=1:numel(var_test)
for i2=(i1+1):numel(var_test)
subplot(numel(var_test),numel(var_test),i2+(i1-1)*numel(var_test)); hold on
plot(tim.(var_test{i2}),tim.(var_test{i1}),'.r');
plot(t.(var_test{i2}),t.(var_test{i1}),'.k');
if i2==i1+1, xlabel(strrep(var_test{i2},'_',' ')); ylabel(strrep(var_test{i1},'_',' ')); end
box on;
end
end
Fill missing powerCurve
We parametrize the function with its max value: its position and its amount
figure('position',[0 0 900 300]);
plot(powerCurve.ws_smooth,(powerCurve.Power_smooth'./max(powerCurve.Power_smooth,[],2)'))
xlabel('windspeed [m/s]'); ylabel('Power'); title('Power curve of each turbine normalized by its max value')
[maxTimPower,maxTimPowerID] = max(t.powerCurve,[],2); maxTimPowerID(maxTimPowerID==1)=nan; % maxTimPowerID==1 is when there are no data
figure('position',[0 0 900 300]);
subplot(1,3,1);plot(maxTimPower,tim.turbine_capacity,'.k');l=lsline; l.Color='r'; xlabel('Max Power Curve'); ylabel('Turbine Capacity')
subplot(1,3,2);plot(maxTimPower,tim.turbine_sweptArea,'.k');l=lsline; l.Color='r'; xlabel('Max Power Curve'); ylabel('Turbine sweptArea')
subplot(1,3,3);plot(maxTimPower,tim.Hub_height,'.k');l=lsline; l.Color='r'; xlabel('Max Power Curve'); ylabel('Hub height')
The Max Power Curve value is well related to the three turbine caracteristic. We can use again a GLM for fill the missing value.
figure('position',[0 0 900 300]);
subplot(1,3,1);plot(powerCurve.ws_smooth(maxTimPowerID(~isnan(maxTimPowerID))),tim.turbine_capacity(~isnan(maxTimPowerID)),'.k');l=lsline; l.Color='r'; xlabel('Speed of Max Power'); ylabel('Turbine Capacity')
subplot(1,3,2);plot(powerCurve.ws_smooth(maxTimPowerID(~isnan(maxTimPowerID))),tim.turbine_sweptArea(~isnan(maxTimPowerID)),'.k');l=lsline; l.Color='r'; xlabel('Speed of Max Power'); ylabel('Turbine sweptArea')
subplot(1,3,3);plot(powerCurve.ws_smooth(maxTimPowerID(~isnan(maxTimPowerID))),tim.Hub_height(~isnan(maxTimPowerID)),'.k');l=lsline; l.Color='r'; xlabel('Speed of Max Power'); ylabel('Hub height')