Windfarm Processing

Processing of the windfarm database consists of filtering the data, adding powercurve, and adding missing data
Table of Contents
Add functions to the path
addpath('functions')

Load Windturbine Data

filename = 'TheWindPowerDatabaseWithRotorParameters23Mar.csv';
opts = detectImportOptions(['WT_data/' filename],'Encoding','UTF-8');
t=readtable(['WT_data/' filename],opts);
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.

Basic Filtering

Remove offshore turbine
t(~strcmp(t.Offshore_Shore_distance,'No'),:)=[];
Remove turbine not in production (i.e., Construction, Planned, Approved or dismantled)
t(~strcmp(t.Status,'Production'),:)=[];
Remove data without location information
t(isnan(t.Latitude) | isnan(t.Longitude),:)=[];
Remove data outside of study area as defined by the bird datase.
To do that, we first assign to each windfarm the grid id of its location in the bird database and then we filter for windfarm further than 50km from the closest grid id.
load('../2018/data/Density_estimationMap','g');
[dist,grid_id] = min(pdist2([g.lat2D(g.latlonmask) g.lon2D(g.latlonmask)],[t.Latitude t.Longitude],@lldistkm));
t.grid_id = grid_id';
t(dist>50,:)=[];
Display table
t
t = 12518×34 table
 IDContinentISO_codeCountryState_codeAreaCityNameX2nd_nameLatitudeLongitudeAltitude_DepthLocation_accuracyOffshore_Shore_distanceManufacturerTurbineHub_heightNumber_of_turbinesTotal_powerDeveloperOperatorOwnerCommissioning_dateStatusDecommissioning_dateLinkUpdateturbine_maketurbine_capacityturbine_rotorDiameter
11256'Europe''AT''Austria''NA''Niederösterreich''Ertl''Ertl''NA'47.967014.6600'NA''Yes''No''Enercon''E40/500'651500'NA''Prenn Franz''NA'2000'Production''''Link'29-Apr-0019'Enercon'50040.3000
21247'Europe''AT''Austria''NA''Oberösterreich''Eberschwang''Eberschwang''NA'48.145013.5190'620''Yes''No''Enercon''E40/500'5021000'Windkraft Innviertel GmbH/EWS Consulting''Windkraft Innviertel GmbH''NA'NaN'Production''''Link'29-Apr-0019'Enercon'50040.3000
31252'Europe''AT''Austria''NA''Oberösterreich''Laussa''Laussa''Plattenberg'47.952014.4920'NA''Yes''No''Tacke''TW-600'5031800'EWS Consulting''Energie AG''Erneuerbare Energie Laussa GmbH'1996'Production''''Link'29-Apr-0019'Tacke'60043
416707'Europe''AT''Austria''NA''Oberösterreich''Munderfing''Munderfing''Kobernaußerwald'48.055013.2420'NA''Yes''No''Vestas''V112/3000'140515000'NA''NA''Municipality of Munderfing/EWS Consulting'NaN'Production''''Link'29-Apr-0019'Vestas'3000112
51251'Europe''AT''Austria''NA''Oberösterreich''Oberrödham''Oberrödham''NA'48.330013.6620'NA''Yes''No''Vestas''V47/660'6531980'EWS Consulting''Energie AG''NA'1999'Production''''Link'29-Apr-0019'Vestas'66047
61254'Europe''AT''Austria''NA''Oberösterreich''Schenkenfelden''Schenkenfelden''NA'48.510014.3500'NA''Yes''No''Micon''M1800-600/48'6021200'EWS Consulting''Future Energy GmbH''WEB Windenergie AG'1998'Production''''Link'20-Feb-0020'Micon'60048
71249'Europe''AT''Austria''NA''Oberösterreich''Schernham''Schernham''NA'48.179013.6090'NA''Yes''No''Enercon''E66/1800'8611800'Windkraft Innviertel GmbH''Windkraft Innviertel GmbH''NA'NaN'Production''''Link'29-Apr-0019'Enercon'180070
81257'Europe''AT''Austria''NA''Oberösterreich''Spörbichl''Spörbichl''NA'48.552014.6000'NA''Yes''No''Vestas''V47/660'6521320'EWS Consulting''New Energy/Austrian Wind Power GmbH''NA'1999'Production''''Link'29-Apr-0019'Vestas'66047
91248'Europe''AT''Austria''NA''Oberösterreich''Steiglberg''Steiglberg''NA'48.096013.3560'NA''Yes''No''Vestas''V80/2000'10012000'EWS Consulting''NA''Windkraft Simonsfeld GmbH & Co KG'2002'Production''''Link'16-Sep-0019'Vestas'200080
101250'Europe''AT''Austria''NA''Oberösterreich''Steindlberg''Steindlberg''NA'48.322013.6660'NA''Yes''No''Vestas''V47/660'6521320'EWS Consulting''Energy from A to Z Gmbh & Co KGa''NA'2001'Production''''Link'29-Apr-0019'Vestas'66047
111253'Europe''AT''Austria''NA''Oberösterreich''Vorderweissenbach''Sternwald I''Sternwind'48.579014.2310'NA''Yes''No''Vestas''V80/2000'10012000'Sternwind/ImWind''Sternwind/WEB Windenergie AG''Sternwind/WEB Windenergie AG'NaN'Production''''Link'20-Feb-0020'Vestas'200080
1210444'Europe''AT''Austria''NA''Oberösterreich''Vorderweissenbach''Sternwald II''Sternwind'48.579014.2310'NA''Yes''No''Vestas''V90/2000'105612000'Sternwind/EWS Consulting/ImWind''Sternwind/WEB Windenergie AG''Sternwind/WEB Windenergie AG'NaN'Production''''Link'20-Feb-0020'Vestas'200090
1322495'Europe''AT''Austria''NA''Oberösterreich''Vorderweissenbach''Sternwald III''NA'48.584014.2200'NA''Yes''No''Vestas''V112/3000'NaN26000'WEB Windenergie AG''Sternwind/WEB Windenergie AG''Sternwind/WEB Windenergie AG'2016'Production''''Link'20-Feb-0020'Vestas'3000112
1423831'Europe''BE''Belgium''NA''Antwerpen (Vlaanderen)''Antwerpen''Antwerpen BASF''Zandvliet BASF'51.36304.2970'NA''Yes''No''Enercon''E82/2000'98612000'Electrabel''Engie''Engie'NaN'Production''''Link'23-Mar-0020'Enercon'200082

Load Power Curve data

Power curve data provide the relationship to convert windspeed into power production. This data is downloaded from ...
filename = 'Power_curves_20210319.xls';
opts = detectImportOptions(['WT_data/' filename],'Sheet','Power_curves');
tp=readtable(['WT_data/' filename],opts);
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
powerCurve.ws = table2array(tp(1,6:end-2));
powerCurve.ManufucturerName = tp.ManufucturerName(2:end);
powerCurve.TurbineName = tp.TurbineName(2:end);
powerCurve.Power = table2array(tp(2:end,6:end-2));

Smoothing Power-curve

Because windspeed is more heterogenous (in time) than the reanalysis data, it is standard procedure to smooth the powercurve with a Gaussian of mean 0 and standard deviation proportionaly to the windspeed. Here are some equation used in the litterature
As our wind databse has a 1hour resolution, we follow Staffell & Pfenninger (2016)
powerCurve.ws_smooth=powerCurve.ws(1):0.1:powerCurve.ws(end);
powerCurve.Power_smooth = nan(size(powerCurve.Power,1),numel(powerCurve.ws_smooth));
for i_p=1:size(powerCurve.Power,1)
p = interp1(powerCurve.ws,powerCurve.Power(i_p,:),powerCurve.ws_smooth);
% figure; plot(powerCurve.ws,powerCurve.Power(i_p,:),'or',powerCurve.ws_smooth,p,'-k')
for i_w=1:numel(powerCurve.ws_smooth)
s = 0.6+0.2*powerCurve.ws_smooth(i_w);
weight = normpdf(powerCurve.ws_smooth,powerCurve.ws_smooth(i_w),s);
weight = weight ./sum(weight);
powerCurve.Power_smooth(i_p,i_w)=weight*p';
end
% figure; hold on; plot(powerCurve.ws,powerCurve.Power(i_p,:)); plot(powerCurve.ws_smooth,powerCurve.Power_smooth(i_p,:));
end

Add Powercurve to windturbine

We assign the power curve to each windfarm
[~,Locb] = ismember(string([t.Turbine t.Manufacturer]) , string([powerCurve.TurbineName powerCurve.ManufucturerName]),'rows');
% t2=t(Locb==0,:);
t.powerCurve = nan(size(t,1),numel(powerCurve.ws_smooth));
t.powerCurve(Locb~=0,:) = powerCurve.Power_smooth(Locb(Locb~=0),:);
It would be nicer to do use the join function, but thes function requires that all windfarms have a matching powercurve. Hence, we use ismemeber()
%t2=join(t,tp,'LeftKeys','ID','RightKeys','Turb_ID');

Analysis of cut-in and cut-off

Compute the cut-in and cut-off value for each powercurve
Cutin = powerCurve.ws(sum(cumprod(powerCurve.Power==0,2),2));
Cutoff = powerCurve.ws(length(powerCurve.ws)-sum(cumprod(fliplr(powerCurve.Power)==0,2),2));
To plot the histogram of cut-in and cut-off, we need to accounts for each windfarm size (number of windturbines).
Cutin_h = repelem(Cutin(Locb(Locb~=0)),t.Number_of_turbines(Locb~=0));
Cutoff_h = repelem(Cutoff(Locb(Locb~=0)),t.Number_of_turbines(Locb~=0));
figure('position',[0 0 1000 300]);
subplot(1,2,1)
histogram(Cutin_h); xlabel('windspeed'); ylabel('Number of windturbine'); title('cut-in'); legend(['M=' num2str(mean(Cutin_h))])
subplot(1,2,2)
histogram(Cutoff_h); xlabel('windspeed'); title('cut-off'); legend(['M=' num2str(mean(Cutoff_h))])
%Cutin(Locb(Locb~=0)).*t.Number_of_turbines(Locb~=0).*t.turbine_sweptArea(Locb~=0))

Data Imputation

Several method exist for missing data: median, GLM or kNN (or more sofisticated like MICE).
Illustration of available data
var_test = {'turbine_sweptArea','turbine_capacity','Hub_height','Number_of_turbines','Total_power','powerCurve'};
figure('position',[0 0 1000 1000]);
subplot(6,1,1); histogram(t.Number_of_turbines); xlabel('Number of turbines'); ylim([0.5 10000]);set(gca, 'YScale', 'log')
subplot(6,1,2); histogram(t.turbine_sweptArea); xlabel('Swept Area'); ylim([0.5 10000]); set(gca, 'YScale', 'log')
subplot(6,1,3); histogram(t.Hub_height); xlabel('Hub height'); axis tight;set(gca, 'YScale', 'log')
subplot(6,1,4); histogram(t.turbine_capacity); xlabel('Turbine Capacity'); axis tight;set(gca, 'YScale', 'log')
subplot(6,1,[5 6]); hold on;
plot(powerCurve.ws_smooth, nanmean(t.powerCurve/1000),'k');
plot(powerCurve.ws_smooth, nanmedian(t.powerCurve/1000),'-.k');
plot(powerCurve.ws_smooth, quantile(t.powerCurve/1000,0.1),'--k');
plot(powerCurve.ws_smooth, quantile(t.powerCurve/1000,.9),'--k');
ax=gca; ylabel('Power MW'); xlabel('Windspeed')
var_test = {'turbine_sweptArea','turbine_capacity','Hub_height','Number_of_turbines','Total_power','powerCurve'};
a=sprintf([num2str(size(t,1)) '\t windfarms\n' ]);
for i=1:numel(var_test)
a=[a sprintf([num2str(sum(all(isnan(t.(var_test{i})),2))) '\t windfarms without ' var_test{i} '\n' ])];
end
disp(a)
12518 windfarms 166 windfarms without turbine_sweptArea 168 windfarms without turbine_capacity 4351 windfarms without Hub_height 154 windfarms without Number_of_turbines 15 windfarms without Total_power 3259 windfarms without powerCurve
Old Illustration
% var_test = {'turbine_sweptArea','turbine_capacity','Hub_height','Number_of_turbines','Total_power'};
% figure('position',[0 0 1000 1000]);
% for i1=1:numel(var_test)
% for i2=(i1+1):numel(var_test)
% subplot(numel(var_test),numel(var_test),i2+(i1-1)*numel(var_test)); hold on
% plot(t.(var_test{i2}),t.(var_test{i1}),'.k');
% if i2==i1+1, xlabel(strrep(var_test{i2},'_',' ')); ylabel(strrep(var_test{i1},'_',' ')); end
% l=lsline; l.Color='r';
% box on;
% end
% end
Create new variable for imputated value
tim=t;

Windfarm level: Number_of_turbines Turbine_capacity and Total_power

First, we deal with value at the windfarm level. Three variables are linked, with total power = number_of_turbine x turbine_capacity.
Check occurane of nan with either/or these two variable.
X=[tim.Number_of_turbines tim.Total_power tim.turbine_capacity];
[a1,~,c] = unique(isnan(X),'rows');
table(~a1(:,1), ~a1(:,2), ~a1(:,3) ,histcounts(c)','variableNames',{'Number of turbine','Total power','Turbine capacity','Number'})
ans = 4×4 table
 Number of turbineTotal powerTurbine capacityNumber
111112350
210014
3010153
40001
For all the missing windfarm, we missing two variables, so we can't fill them up easily.
We first fill-up Number_of_turbines and Total_power at the same time.We perform the prediction for both variable with a fglm
mdl = fitglm(tim.Number_of_turbines, tim.Total_power);
mdl2 = fitglm(tim.Total_power,tim.Number_of_turbines);
[ypred,yci] = predict(mdl,tim.Number_of_turbines(isnan(tim.Total_power),:));
tim.Total_power(isnan(tim.Total_power)) = ypred;
[ypred,yci] = predict(mdl2,tim.Total_power(isnan(tim.Number_of_turbines),:));
tim.Number_of_turbines(isnan(tim.Number_of_turbines)) = round(ypred);
Ilustration
figure('position',[0 0 500 300]); hold on; box on;
plot(t.Total_power,t.Number_of_turbines,'.k'); l=lsline; l.Color='r';
plot(tim.Total_power(isnan(t.Total_power)|isnan(t.Number_of_turbines)),tim.Number_of_turbines(isnan(t.Total_power)|isnan(t.Number_of_turbines)),'.r');
xlabel('Total_power'); ylabel('Number_of_turbines');
Then, we fill-up turbine capacity
Wind turbine capacity is directly related to total power and number of windturbine
tim.turbine_capacity(isnan(tim.turbine_capacity)) = tim.Total_power(isnan(tim.turbine_capacity)) ./ tim.Number_of_turbines(isnan(tim.turbine_capacity));
Illustration
figure; hold on;
plot(t.Total_power ./ t.Number_of_turbines , t.turbine_capacity , '.k' ); l=lsline; l.Color='r';
plot(tim.Total_power(isnan(t.turbine_capacity)) ./ tim.Number_of_turbines(isnan(t.turbine_capacity)) , tim.turbine_capacity(isnan(t.turbine_capacity)) , '.r' );
xlabel('Total_power / Number_of_turbines'); ylabel('turbine_capacity');

Windturbine level: Hub_height and turbine_capacity

At the windturbine level, we are missing a lot of hub height. Hub height is well related to turbine capacity and swept area.
Check occurane of nan for the three variable
X=[tim.turbine_sweptArea tim.turbine_capacity tim.Hub_height];
[a1,~,c] = unique(isnan(X),'rows');
table(~a1(:,1), ~a1(:,2), ~a1(:,3) ,histcounts(c)','variableNames',{'Swept area','Turbine capacity','hub height','Number'})
ans = 4×4 table
 Swept areaTurbine capacityhub heightNumber
11118167
21104185
3010165
40001
Hub height can be determined from swept area and/or turbine capacity for 4K cases.
First, we build a glm with turbine_capacity and turbine_sweptArea and predict only for nan value in Hub_height:
mdl = fitglm([tim.turbine_sweptArea tim.turbine_capacity],tim.Hub_height);
[ypred,yci] = predict(mdl,[tim.turbine_sweptArea(isnan(tim.Hub_height),:) tim.turbine_capacity(isnan(tim.Hub_height),:)]);
ypred(ypred>max(tim.Hub_height))=max(tim.Hub_height); % Avoid missleading value of hubheight
tim.Hub_height(isnan(tim.Hub_height)) = ypred;
Then, we build a glm with turbine_capacity and predict only for nan value in Hub_height:
mdl = fitglm(tim.turbine_capacity,tim.Hub_height);
[ypred,yci] = predict(mdl,tim.turbine_capacity(isnan(tim.Hub_height),:));
% ypred(ypred>max(tim.Hub_height))=max(tim.Hub_height); % Avoid missleading value of hubheight
tim.Hub_height(isnan(tim.Hub_height)) = ypred;
Finally, we build a glm with turbine_capacity and predict only for nan value in turbine_sweptArea:
mdl = fitglm(tim.turbine_capacity,tim.turbine_sweptArea);
[ypred,yci] = predict(mdl,tim.turbine_capacity(isnan(tim.turbine_sweptArea),:));
% ypred(ypred>max(tim.Hub_height))=max(tim.Hub_height); % Avoid missleading value of hubheight
tim.turbine_sweptArea(isnan(tim.turbine_sweptArea)) = ypred;
Ilustration
figure('position',[0 0 1000 300]);
subplot(1,2,1); hold on
plot(t.turbine_sweptArea,t.Hub_height,'.k'); l=lsline; l.Color='r';
plot(tim.turbine_sweptArea(isnan(t.Hub_height)),tim.Hub_height(isnan(t.Hub_height)),'.r');
xlabel('turbine_sweptArea'); ylabel('Hub_height');
box on;
subplot(1,2,2); hold on
plot(t.turbine_capacity,t.Hub_height,'.k'); l=lsline; l.Color='r';
plot(tim.turbine_capacity(isnan(t.Hub_height)),tim.Hub_height(isnan(t.Hub_height)),'.r');
xlabel('turbine_capacity'); ylabel('Hub_height');
box on;
We can then assess the missing value in the new imputated table
var_test = {'turbine_sweptArea','turbine_capacity','Hub_height','Number_of_turbines','Total_power'};
a=sprintf([num2str(size(tim,1)) '\t windfarms\n' ]);
for i=1:numel(var_test)
a=[a sprintf([num2str(sum(all(isnan(tim.(var_test{i})),2))) '\t windfarms without ' var_test{i} '\n' ])];
end
disp(a)
12518 windfarms 1 windfarms without turbine_sweptArea 1 windfarms without turbine_capacity 1 windfarms without Hub_height 1 windfarms without Number_of_turbines 1 windfarms without Total_power
The remaining missing data (if any) can be filled with the median value.
for i=1:numel(var_test)
tim.(var_test{i})(isnan(tim.(var_test{i})))=nanmedian(tim.(var_test{i}));
end

Power Curve

To input the power curve, we parametrize the power curve function based on its max value which corresponds to the windturbine capacity.
We first build a power curve template based on the median value of all windfarm (note that we use tim here)
powerCurve_template = nanmedian(tim.powerCurve'./max(tim.powerCurve,[],2)',2);
figure('position',[0 0 900 300]); hold on
plot(powerCurve.ws_smooth,(powerCurve.Power_smooth'./max(powerCurve.Power_smooth,[],2)'),'color',[.8 .8 .8])
plot(powerCurve.ws_smooth,powerCurve_template,'k','linewidth',2)
xlabel('windspeed [m/s]'); ylabel('Power'); title('Power curve normalized by its max value')
The maximum value of the powercurve is well related to the turbine capacity
Use GLM to impute max powercurve based on turbine capacity
mdl = fitglm(tim.turbine_capacity,max(tim.powerCurve,[],2) );
[ypred,yci] = predict(mdl,tim.turbine_capacity(isnan(max(tim.powerCurve,[],2)),:));
% ypred(ypred>max(tim.powerCurve(:)))=max(tim.powerCurve(:));
tim.powerCurve(isnan(max(tim.powerCurve,[],2)),:) = powerCurve_template' .* ypred;
figure; hold on; box on;
plot(max(t.powerCurve,[],2),t.turbine_capacity,'.k');l=lsline; l.Color='r';
plot(max(tim.powerCurve(isnan(max(t.powerCurve,[],2)),:),[],2),t.turbine_capacity(isnan(max(t.powerCurve,[],2))),'.r');
xlabel('Max Power Curve'); ylabel('Turbine Capacity')

Windfarm database characteristics

disp(['Number of Windfarms: ' num2str(height(tim))])
Number of Windfarms: 12518
disp(['Number of Windturbines: ' num2str(nansum(tim.Number_of_turbines))])
Number of Windturbines: 42696
disp(['Total power: ' num2str(nansum(tim.Total_power*1000/1e9))])
Total power: 77.0891
Group by country
tim.Total_sweptArea = tim.turbine_sweptArea.*tim.Number_of_turbines;
groupsummary(tim,'Country','sum',{'Number_of_turbines' 'Total_power','Total_sweptArea'})
ans = 11×5 table
 CountryGroupCountsum_Number_of_turbinessum_Total_powersum_Total_sweptArea
1'Austria'1331479201.4496e+05
2'Belgium'26081118390505.2464e+06
3'Czech Republic'451242228906.5768e+05
4'Denmark'3104953838409.3967e+05
5'France'142176141.6036e+074.7636e+07
6'Germany'9703304725.3427e+071.5217e+08
7'Luxembourg'26741535004.4892e+05
8'Netherlands'624175531577108.3218e+06
9'Poland'5050911068553.6533e+06
10'Spain'477746394101.7896e+06
11'Switzerland'1937750502.0912e+05
*GroupCount=number of windfarm
Distance to cast
load('coastlines.mat')
tmp = pdist2([tim.Latitude tim.Longitude],[coastlat coastlon], @lldistkm);
tim.distCoast = min(tmp,[],2);
hh = repelem(tim.Hub_height,tim.Number_of_turbines);
figure('position',[0 0 800 400]);
histogram(hh); ylabel('Number of Windturbines'); xlabel('Hub Height [m]');
legend(['M=' num2str(mean(hh)) ' ; SD=' num2str(std(hh)) ])
rr = repelem(tim.turbine_rotorRadius,tim.Number_of_turbines);
hhhh = repelem(tim.Hub_height,tim.Number_of_turbines);
h=0:200;
tmp = real(2*sqrt(rr.^2 - (h-hhhh).^2));
tmp = tmp ./ sum(tmp,2) .* rr.^2*pi;
figure('position',[0 0 800 400]);
subplot(1,2,1); barh(h,nansum(tmp))
ylabel('Height above ground level'); xlabel('Sum of area [m]');
subplot(1,2,2); plot(h,cumsum(nansum(tmp))./sum(tmp(:),'omitnan'))
xlabel('Height above ground level'); grid on; axis tight; xline([60 150])
dd = repelem(sqrt(tim.turbine_sweptArea/pi)*2,tim.Number_of_turbines);
figure('position',[0 0 800 400]);
histogram(dd); ylabel('Number of Windturbines'); xlabel('Swept Diameter [m]');
legend(['M=' num2str(mean(dd)) ' ; SD=' num2str(std(dd)) ])

Save

save('data/windfarms_processed.mat', 'tim')