Fixing the issue in assumption of OLS step by step or one by one
Recent newsHi, I want to raise the issue related to know whether your OLS is ok or not.
read more(Comments)
It turns out, Matlab prepare a stress testing that we can test by our self. And here is the explanation of this
the link come from this
And the full explanation is
This example shows how to work with consumer (retail) credit panel data to visualize observed default rates at different levels. It also shows how to fit a model to predict probabilities of default (PD) and lifetime PD values, and perform a stress-testing analysis.
The panel data set of consumer loans enables you to identify default rate patterns for loans of different ages, or years on books. You can use information about a score group to distinguish default rates for different score levels. In addition, you can use macroeconomic information to assess how the state of the economy affects consumer loan default rates.
A standard logistic regression model, a type of generalized linear model, is fitted to the retail credit panel data with and without macroeconomic predictors, using fitLifetimePDModel
from Risk Management Toolbox™. Although the same model can be fitted using the fitglm
function from Statistics and Machine Learning Toolbox™, the lifetime probability of default (PD) version of the model is designed for credit applications, and supports lifetime PD prediction and model validation tools, including the discrimination and accuracy plots shown in this example. The example also describes how to fit a more advanced model to account for panel data effects, a generalized linear mixed effects model. However, the panel effects are negligible for the data set in this example and the standard logistic model is preferred for efficiency.
The logistic regression model predicts probabilities of default for all score levels, years on books, and macroeconomic variable scenarios. There is a brief discussion on how to predict lifetime PD values, with pointers to additional functionality. The example shows model discrimination and model accuracy tools to validate and compare models. In the last section of this example, the logistic model is used for a stress-testing analysis, the model predicts probabilities of default for a given baseline, as well as default probabilities for adverse and severely adverse macroeconomic scenarios.
For additional information, refer to Overview of Lifetime Probability of Default Models. See also the example Modeling Probabilities of Default with Cox Proportional Hazards, which follows the same workflow but uses Cox regression instead of logistic regression and also has information on the computation of lifetime PD and lifetime Expected Credit Loss (ECL).
The main data set (data
) contains the following variables:
ID
: Loan identifier.
ScoreGroup
: Credit score at the beginning of the loan, discretized into three groups: High Risk
, Medium Risk
, and Low Risk
.
YOB
: Years on books.
Default
: Default indicator. This is the response variable.
Year
: Calendar year.
There is also a small data set (dataMacro
) with macroeconomic data for the corresponding calendar years:
Year
: Calendar year.
GDP
: Gross domestic product growth (year over year).
Market
: Market return (year over year).
The variables YOB
, Year
, GDP
, and Market
are observed at the end of the corresponding calendar year. The score group is a discretization of the original credit score when the loan started. A value of 1
for Default
means that the loan defaulted in the corresponding calendar year.
There is also a third data set (dataMacroStress
) with baseline, adverse, and severely adverse scenarios for the macroeconomic variables. This table is used for the stress-testing analysis.
This example uses simulated data, but the same approach has been successfully applied to real data sets.
Load the data and view the first 10 and last 10 rows of the table. The panel data is stacked, in the sense that observations for the same ID are stored in contiguous rows, creating a tall, thin table. The panel is unbalanced, because not all IDs have the same number of observations.
load RetailCreditPanelData.mat fprintf('\nFirst ten rows:\n')
First ten rows:
disp(data(1:10,:))
ID ScoreGroup YOB Default Year __ ___________ ___ _______ ____ 1 Low Risk 1 0 1997 1 Low Risk 2 0 1998 1 Low Risk 3 0 1999 1 Low Risk 4 0 2000 1 Low Risk 5 0 2001 1 Low Risk 6 0 2002 1 Low Risk 7 0 2003 1 Low Risk 8 0 2004 2 Medium Risk 1 0 1997 2 Medium Risk 2 0 1998
fprintf('Last ten rows:\n')
Last ten rows:
disp(data(end-9:end,:))
ID ScoreGroup YOB Default Year _____ ___________ ___ _______ ____ 96819 High Risk 6 0 2003 96819 High Risk 7 0 2004 96820 Medium Risk 1 0 1997 96820 Medium Risk 2 0 1998 96820 Medium Risk 3 0 1999 96820 Medium Risk 4 0 2000 96820 Medium Risk 5 0 2001 96820 Medium Risk 6 0 2002 96820 Medium Risk 7 0 2003 96820 Medium Risk 8 0 2004
nRows = height(data);
UniqueIDs = unique(data.ID);
nIDs = length(UniqueIDs);
fprintf('Total number of IDs: %d\n',nIDs)
Total number of IDs: 96820
fprintf('Total number of rows: %d\n',nRows)
Total number of rows: 646724
Use the credit score group as a grouping variable to compute the observed default rate for each score group. For this, use the groupsummary
function to compute the mean of the Default
variable, grouping by the ScoreGroup
variable. Plot the results on a bar chart. As expected, the default rate goes down as the credit quality improves.
DefRateByScore = groupsummary(data,'ScoreGroup','mean','Default'); NumScoreGroups = height(DefRateByScore); disp(DefRateByScore)
ScoreGroup GroupCount mean_Default ___________ __________ ____________ High Risk 2.0999e+05 0.017167 Medium Risk 2.1743e+05 0.0086006 Low Risk 2.193e+05 0.0046784
bar(DefRateByScore.ScoreGroup,DefRateByScore.mean_Default*100) title('Default Rate vs. Score Group') xlabel('Score Group') ylabel('Observed Default Rate (%)') grid on
Next, compute default rates grouping by years on books (represented by the YOB
variable). The resulting rates are conditional one-year default rates. For example, the default rate for the third year on books is the proportion of loans defaulting in the third year, relative to the number of loans that are in the portfolio past the second year. In other words, the default rate for the third year is the number of rows with YOB
= 3
and Default
= 1, divided by the number of rows with YOB
= 3
.
Plot the results. There is a clear downward trend, with default rates going down as the number of years on books increases. Years three and four have similar default rates. However, it is unclear from this plot whether this is a characteristic of the loan product or an effect of the macroeconomic environment.
DefRateByYOB = groupsummary(data,'YOB','mean','Default'); NumYOB = height(DefRateByYOB); disp(DefRateByYOB)
YOB GroupCount mean_Default ___ __________ ____________ 1 96820 0.017507 2 94535 0.012704 3 92497 0.011168 4 91068 0.010728 5 89588 0.0085949 6 88570 0.006413 7 61689 0.0033231 8 31957 0.0016272
plot(double(DefRateByYOB.YOB),DefRateByYOB.mean_Default*100,'-*') title('Default Rate vs. Years on Books') xlabel('Years on Books') ylabel('Observed Default Rate (%)') grid on
Now, group both by the score group and number of years on books and then plot the results. The plot shows that all score groups behave similarly as time progresses, with a general downward trend. Years three and four are an exception to the downward trend: the rates flatten for the High Risk
group, and go up in year three for the Low Risk
group.
DefRateByScoreYOB = groupsummary(data,{'ScoreGroup','YOB'},'mean','Default'); % Display output table to show the way it is structured % Display only the first 10 rows, for brevity disp(DefRateByScoreYOB(1:10,:))
ScoreGroup YOB GroupCount mean_Default ___________ ___ __________ ____________ High Risk 1 32601 0.029692 High Risk 2 31338 0.021252 High Risk 3 30138 0.018448 High Risk 4 29438 0.018276 High Risk 5 28661 0.014794 High Risk 6 28117 0.011168 High Risk 7 19606 0.0056615 High Risk 8 10094 0.0027739 Medium Risk 1 32373 0.014302 Medium Risk 2 31775 0.011676
DefRateByScoreYOB2 = reshape(DefRateByScoreYOB.mean_Default,... NumYOB,NumScoreGroups); plot(DefRateByScoreYOB2*100,'-*') title('Default Rate vs. Years on Books') xlabel('Years on Books') ylabel('Observed Default Rate (%)') legend(categories(data.ScoreGroup)) grid on
The data contains three cohorts, or vintages: loans started in 1997, 1998, and 1999. No loan in the panel data started after 1999.
This section shows how to visualize the default rate for each cohort separately. The default rates for all cohorts are plotted, both against the number of years on books and against the calendar year. Patterns in the years on books suggest the loan product characteristics. Patterns in the calendar years suggest the influence of the macroeconomic environment.
From years two through four on books, the curves show different patterns for the three cohorts. When plotted against the calendar year, however, the three cohorts show similar behavior from 2000 through 2002. The curves flatten during that period.
% Get IDs of 1997, 1998, and 1999 cohorts IDs1997 = data.ID(data.YOB==1&data.Year==1997); IDs1998 = data.ID(data.YOB==1&data.Year==1998); IDs1999 = data.ID(data.YOB==1&data.Year==1999); % IDs2000AndUp is unused, it is only computed to show that this is empty, % no loans started after 1999 IDs2000AndUp = data.ID(data.YOB==1&data.Year>1999); % Get default rates for each cohort separately ObsDefRate1997 = groupsummary(data(ismember(data.ID,IDs1997),:),... 'YOB','mean','Default'); ObsDefRate1998 = groupsummary(data(ismember(data.ID,IDs1998),:),... 'YOB','mean','Default'); ObsDefRate1999 = groupsummary(data(ismember(data.ID,IDs1999),:),... 'YOB','mean','Default'); % Plot against the years on books plot(ObsDefRate1997.YOB,ObsDefRate1997.mean_Default*100,'-*') hold on plot(ObsDefRate1998.YOB,ObsDefRate1998.mean_Default*100,'-*') plot(ObsDefRate1999.YOB,ObsDefRate1999.mean_Default*100,'-*') hold off title('Default Rate vs. Years on Books') xlabel('Years on Books') ylabel('Default Rate (%)') legend('Cohort 97','Cohort 98','Cohort 99') grid on
% Plot against the calendar year Year = unique(data.Year); plot(Year,ObsDefRate1997.mean_Default*100,'-*') hold on plot(Year(2:end),ObsDefRate1998.mean_Default*100,'-*') plot(Year(3:end),ObsDefRate1999.mean_Default*100,'-*') hold off title('Default Rate vs. Calendar Year') xlabel('Calendar Year') ylabel('Default Rate (%)') legend('Cohort 97','Cohort 98','Cohort 99') grid on
After you visualize the data, you can build predictive models for the default rates.
Split the panel data into training and testing sets, defining these sets based on ID numbers.
NumTraining = floor(0.6*nIDs); rng('default'); % for reproducibility TrainIDInd = randsample(nIDs,NumTraining); TrainDataInd = ismember(data.ID,UniqueIDs(TrainIDInd)); TestDataInd = ~TrainDataInd;
The first model uses only score group and number of years on books as predictors of the default rate p. The odds of defaulting are defined as p/(1-p). The logistic model relates the logarithm of the odds, or log odds, to the predictors as follows:
log(p1−p)=aH+aM1M+aL1L+bYOBYOB+ϵ
1M is an indicator with a value 1
for Medium Risk
loans and 0
otherwise, and similarly for 1L for Low Risk
loans. This is a standard way of handling a categorical predictor such as ScoreGroup
. There is effectively a different constant for each risk level: aH for High Risk
, aH+aM for Medium Risk
, and aH+aL for Low Risk
.
ModelNoMacro = fitLifetimePDModel(data(TrainDataInd,:),'logistic',... 'ModelID','No Macro','Description','Logistic model with YOB and score group, but no macro variables',... 'IDVar','ID','LoanVars','ScoreGroup','AgeVar','YOB','ResponseVar','Default'); disp(ModelNoMacro.Model)
Compact generalized linear regression model: logit(Default) ~ 1 + ScoreGroup + YOB Distribution = Binomial Estimated Coefficients: Estimate SE tStat pValue ________ ________ _______ ___________ (Intercept) -3.2453 0.033768 -96.106 0 ScoreGroup_Medium Risk -0.7058 0.037103 -19.023 1.1014e-80 ScoreGroup_Low Risk -1.2893 0.045635 -28.253 1.3076e-175 YOB -0.22693 0.008437 -26.897 2.3578e-159 388018 observations, 388014 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 1.83e+03, p-value = 0
For any row in the data, the value of p is not observed, only a 0
or 1
default indicator is observed. The calibration finds model coefficients, and the predicted values of p for individual rows can be recovered with the predict
function.
The Intercept
coefficient is the constant for the High Risk
level (the aH term), and the ScoreGroup_Medium Risk
and ScoreGroup_Low Risk
coefficients are the adjustments for Medium Risk
and Low Risk
levels (the aM and aL terms).
The default probability p and the log odds (the left side of the model) move in the same direction when the predictors change. Therefore, because the adjustments for Medium Risk
and Low Risk
are negative, the default rates are lower for better risk levels, as expected. The coefficient for number of years on books is also negative, consistent with the overall downward trend for number of years on books observed in the data.
An alternative way to fit the model is using the fitglm
function from Statistics and Machine Learning Toolbox™. The formula above is expressed as
Default ~ 1 + ScoreGroup + YOB
The 1 + ScoreGroup
terms account for the baseline constant and the adjustments for risk level. Set the optional argument Distribution
to binomial
to indicate that a logistic model is desired (that is, a model with log odds on the left side), as follows:
ModelNoMacro = fitglm(data(TrainDataInd,:), 'Default ~ 1 + ScoreGroup + YOB','Distribution','binomial');
As mentioned in the introduction, the advantage of the lifetime PD version of the model fitted with fitLifetimePDModel
is that it is designed for credit applications, and it can predict lifetime PD and supports model validation tools, including the discrimination and accuracy plots. For more information, see Overview of Lifetime Probability of Default Models.
To account for panel data effects, a more advanced model using mixed effects can be fitted using the fitglm
function from Statistics and Machine Learning Toolbox™. Although this model is not fitted in this example, the code is very similar:
ModelNoMacro = fitglme(data(TrainDataInd,:),'Default ~ 1 + ScoreGroup + YOB + (1|ID)','Distribution','binomial');
The (1|ID)
term in the formula adds a random effect to the model. This effect is a predictor whose values are not given in the data, but calibrated together with the model coefficients. A random value is calibrated for each ID. This additional calibration requirement substantially increases the computational time to fit the model in this case, because of the very large number of IDs. For the panel data set in this example, the random term has a negligible effect. The variance of the random effects is very small and the model coefficients barely change when the random effect is introduced. The simpler logistic regression model is preferred, because it is faster to calibrate and to predict, and the default rates predicted with both models are essentially the same.
Predict the probability of default for training and testing data. The predict
function predicts conditional PD values, row by row. We store the data to compare the predictions against the macro model in the next section.
data.PDNoMacro = zeros(height(data),1); % Predict in-sample data.PDNoMacro(TrainDataInd) = predict(ModelNoMacro,data(TrainDataInd,:)); % Predict out-of-sample data.PDNoMacro(TestDataInd) = predict(ModelNoMacro,data(TestDataInd,:));
To make lifetime PD predictions, use the predictLifetime
function. For lifetime predictions, projected values of the predictors are required for each ID value in the prediction data set. For example, predict the survival probability for the first two IDs in the dataset. See how the conditional PD (PDNoMacro
column) and the lifetime PD (LifetimePD
column) match for the first year of each ID. After that year, the lifetime PD increases because it is a cumulative probability. For more information, see predictLifetime
. See also the Expected Credit Loss Computation example.
data1 = data(1:16,:); data1.LifetimePD = predictLifetime(ModelNoMacro,data1); disp(data1)
ID ScoreGroup YOB Default Year PDNoMacro LifetimePD __ ___________ ___ _______ ____ _________ __________ 1 Low Risk 1 0 1997 0.0084797 0.0084797 1 Low Risk 2 0 1998 0.0067697 0.015192 1 Low Risk 3 0 1999 0.0054027 0.020513 1 Low Risk 4 0 2000 0.0043105 0.024735 1 Low Risk 5 0 2001 0.0034384 0.028088 1 Low Risk 6 0 2002 0.0027422 0.030753 1 Low Risk 7 0 2003 0.0021867 0.032873 1 Low Risk 8 0 2004 0.0017435 0.034559 2 Medium Risk 1 0 1997 0.015097 0.015097 2 Medium Risk 2 0 1998 0.012069 0.026984 2 Medium Risk 3 0 1999 0.0096422 0.036366 2 Medium Risk 4 0 2000 0.0076996 0.043785 2 Medium Risk 5 0 2001 0.006146 0.049662 2 Medium Risk 6 0 2002 0.0049043 0.054323 2 Medium Risk 7 0 2003 0.0039125 0.058023 2 Medium Risk 8 0 2004 0.0031207 0.060962
Visualize the in-sample (training) or out-of-sample (test) fit using modelAccuracyPlot
. It requires a grouping variable to compute default rates and average predicted PD values for each group. Use the years on books as grouping variable here.
DataSetChoice = "Test"; if DataSetChoice=="Training" Ind = TrainDataInd; else Ind = TestDataInd; end modelAccuracyPlot(ModelNoMacro,data(Ind,:),'YOB','DataID',DataSetChoice)
The score group can be input as a second grouping variable to visualize the fit by score groups.
modelAccuracyPlot(ModelNoMacro,data(Ind,:),{'YOB' 'ScoreGroup'},'DataID',DataSetChoice)
Lifetime PD models also support validation tools for model discrimination. In particular, the modelDiscriminationPlot
function creates the receiver operating characteristic (ROC) curve plot. Here a separate ROC curve is requested for each score group. For more information, see modelDiscriminationPlot
.
modelDiscriminationPlot(ModelNoMacro,data(Ind,:),'SegmentBy','ScoreGroup','DataID',DataSetChoice)
The trend predicted with the previous model, as a function of years on books, has a very regular decreasing pattern. The data, however, shows some deviations from that trend. To try to account for those deviations, add the gross domestic product annual growth (represented by the GDP
variable) and stock market annual returns (represented by the Market
variable) to the model.
log(p1−p)=aH+aM1M+aL1L+bYOBYOB+bGDPGDP+bMarketMarket+ϵ
Expand the data set to add one column for GDP
and one for Market
, using the data from the dataMacro
table.
data = join(data,dataMacro); disp(data(1:10,:))
ID ScoreGroup YOB Default Year PDNoMacro GDP Market __ ___________ ___ _______ ____ _________ _____ ______ 1 Low Risk 1 0 1997 0.0084797 2.72 7.61 1 Low Risk 2 0 1998 0.0067697 3.57 26.24 1 Low Risk 3 0 1999 0.0054027 2.86 18.1 1 Low Risk 4 0 2000 0.0043105 2.43 3.19 1 Low Risk 5 0 2001 0.0034384 1.26 -10.51 1 Low Risk 6 0 2002 0.0027422 -0.59 -22.95 1 Low Risk 7 0 2003 0.0021867 0.63 2.78 1 Low Risk 8 0 2004 0.0017435 1.85 9.48 2 Medium Risk 1 0 1997 0.015097 2.72 7.61 2 Medium Risk 2 0 1998 0.012069 3.57 26.24
Fit the model with the macroeconomic variables, or macro model, by expanding the model formula to include the GDP
and the Market
variables.
ModelMacro = fitLifetimePDModel(data(TrainDataInd,:),'logistic',... 'ModelID','Macro','Description','Logistic model with YOB, score group and macro variables',... 'IDVar','ID','LoanVars','ScoreGroup','AgeVar','YOB',... 'MacroVars',{'GDP','Market'},'ResponseVar','Default'); disp(ModelMacro.Model)
Compact generalized linear regression model: logit(Default) ~ 1 + ScoreGroup + YOB + GDP + Market Distribution = Binomial Estimated Coefficients: Estimate SE tStat pValue __________ _________ _______ ___________ (Intercept) -2.667 0.10146 -26.287 2.6919e-152 ScoreGroup_Medium Risk -0.70751 0.037108 -19.066 4.8223e-81 ScoreGroup_Low Risk -1.2895 0.045639 -28.253 1.2892e-175 YOB -0.32082 0.013636 -23.528 2.0867e-122 GDP -0.12295 0.039725 -3.095 0.0019681 Market -0.0071812 0.0028298 -2.5377 0.011159 388018 observations, 388012 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 1.97e+03, p-value = 0
Both macroeconomic variables show a negative coefficient, consistent with the intuition that higher economic growth reduces default rates.
Use the predict
function to predict the conditional PD. For illustration, here is how to predict the conditional PD on training and testing data using the macro model. The results are stored as a new column in the data
table. Lifetime PD prediction is also supported with the predictLifetime
function, as shown in the Model of Default Rates Using Score Group and Years on Books section.
data.PDMacro = zeros(height(data),1); % Predict in-sample data.PDMacro(TrainDataInd) = predict(ModelMacro,data(TrainDataInd,:)); % Predict out-of-sample data.PDMacro(TestDataInd) = predict(ModelMacro,data(TestDataInd,:));
The model accuracy and discrimination plots offer readily available comparison tools for the models.
Visualize the in-sample or out of sample fit using modelAccuracyPlot
. Pass the predictions from the model without macroeconomic variables as a reference model. Plot both using years on books as the single grouping variable first, and then using score group as a second grouping variable.
DataSetChoice = "Test"; if DataSetChoice=="Training" Ind = TrainDataInd; else Ind = TestDataInd; end modelAccuracyPlot(ModelMacro,data(Ind,:),'YOB','ReferencePD',data.PDNoMacro(Ind),'ReferenceID',ModelNoMacro.ModelID,'DataID',DataSetChoice)
modelAccuracyPlot(ModelMacro,data(Ind,:),{'YOB','ScoreGroup'},'ReferencePD',data.PDNoMacro(Ind),'ReferenceID',ModelNoMacro.ModelID,'DataID',DataSetChoice)
The accuracy of the predictions significantly improves compared to the model with no macroeconomic variables. The predicted conditional PD values more closely follow the pattern of the observed default rates and the root mean square error (RMSE) reported is significantly smaller when the macroeconomic variables are included in the model.
Plot the ROC curve of the macro model and the model without macroeconomic variables to compare their performance with regards to model discrimination.
modelDiscriminationPlot(ModelMacro,data(Ind,:),'ReferencePD',data.PDNoMacro(Ind),'ReferenceID',ModelNoMacro.ModelID,'DataID',DataSetChoice)
Discrimination measures the ranking of customers by risk. Both models perform similarly, with only a slight improvement when the macroeconomic variables are added to the model. This means both models do a similar job separating low risk, medium risk and high risk customers by assigning higher PD values to customers with higher risk.
Although the discrimination performance of both models is similar, the predicted PD values are more accurate for the macro model. Using both discrimination and accuracy tools is important for model validation and model comparison.
Use the fitted macro model to stress-test the predicted probabilities of default.
Assume the following are stress scenarios for the macroeconomic variables provided, for example, by a regulator.
disp(dataMacroStress)
GDP Market _____ ______ Baseline 2.27 15.02 Adverse 1.31 4.56 Severe -0.22 -5.64
Set up a basic data table for predicting the probabilities of default. This is a dummy data table, with one row for each combination of score group and number of years on books.
dataBaseline = table; [ScoreGroup,YOB]=meshgrid(1:NumScoreGroups,1:NumYOB); dataBaseline.ScoreGroup = categorical(ScoreGroup(:),1:NumScoreGroups,... categories(data.ScoreGroup),'Ordinal',true); dataBaseline.YOB = YOB(:); dataBaseline.ID = ones(height(dataBaseline),1); dataBaseline.GDP = zeros(height(dataBaseline),1); dataBaseline.Market = zeros(height(dataBaseline),1);
To make the predictions, set the same macroeconomic conditions (baseline, adverse, or severely adverse) for all combinations of score groups and number of years on books.
% Predict baseline the probabilities of default dataBaseline.GDP(:) = dataMacroStress.GDP('Baseline'); dataBaseline.Market(:) = dataMacroStress.Market('Baseline'); dataBaseline.PD = predict(ModelMacro,dataBaseline); % Predict the probabilities of default in the adverse scenario dataAdverse = dataBaseline; dataAdverse.GDP(:) = dataMacroStress.GDP('Adverse'); dataAdverse.Market(:) = dataMacroStress.Market('Adverse'); dataAdverse.PD = predict(ModelMacro,dataAdverse); % Predict the probabilities of default in the severely adverse scenario dataSevere = dataBaseline; dataSevere.GDP(:) = dataMacroStress.GDP('Severe'); dataSevere.Market(:) = dataMacroStress.Market('Severe'); dataSevere.PD = predict(ModelMacro,dataSevere);
Visualize the average predicted probability of default across score groups under the three alternative regulatory scenarios. Here, all score groups are implicitly weighted equally. However, predictions can also be made at a loan level for any given portfolio to make the predicted default rates consistent with the actual distribution of loans in the portfolio. The same visualization can be produced for each score group separately.
PredPDYOB = zeros(NumYOB,3); PredPDYOB(:,1) = mean(reshape(dataBaseline.PD,NumYOB,NumScoreGroups),2); PredPDYOB(:,2) = mean(reshape(dataAdverse.PD,NumYOB,NumScoreGroups),2); PredPDYOB(:,3) = mean(reshape(dataSevere.PD,NumYOB,NumScoreGroups),2); figure; bar(PredPDYOB*100); xlabel('Years on Books') ylabel('Predicted Default Rate (%)') legend('Baseline','Adverse','Severe') title('Stress Test, Probability of Default') grid on
Generalized Linear Models documentation: https://www.mathworks.com/help/stats/generalized-linear-regression.html
Generalized Linear Mixed Effects Models documentation: https://www.mathworks.com/help/stats/generalized-linear-mixed-effects-models.html
Federal Reserve, Comprehensive Capital Analysis and Review (CCAR): https://www.federalreserve.gov/bankinforeg/ccar.htm
Bank of England, Stress Testing: https://www.bankofengland.co.uk/financial-stability
European Banking Authority, EU-Wide Stress Testing: https://www.eba.europa.eu/risk-analysis-and-data/eu-wide-stress-testing
%%
% 1. first load the credit data
load RetailCreditPanelData.mat
fprintf('\nFirst ten rows:\n')
%%
% 2. First ten rows:
disp(data(1:10,:))
%Last ten rows:
disp(data(end-9:end,:))
%%
% 3. display number of unique id
nRows = height(data);
UniqueIDs = unique(data.ID);
nIDs = length(UniqueIDs);
fprintf('Total number of IDs: %d\n',nIDs)
%total number of rows
fprintf('Total number of rows: %d\n',nRows)
%4. [Scoregroup] lets see what is the group summary based on Score Group
DefRateByScore = groupsummary(data,'ScoreGroup','mean','Default');
NumScoreGroups = height(DefRateByScore);
disp(DefRateByScore)
%[image] to show in the bar
bar(DefRateByScore.ScoreGroup,DefRateByScore.mean_Default*100)
title('Default Rate vs. Score Group')
xlabel('Score Group')
ylabel('Observed Default Rate (%)')
grid on
%5. [Year of Default] find the default rate base on Year of Default
DefRateByYOB = groupsummary(data,'YOB','mean','Default');
NumYOB = height(DefRateByYOB);
disp(DefRateByYOB)
% [Image] the graph of default by Year
plot(double(DefRateByYOB.YOB),DefRateByYOB.mean_Default*100,'-*')
title('Default Rate vs. Years on Books')
xlabel('Years on Books')
ylabel('Observed Default Rate (%)')
grid on
%6. [Combine the YOB and risk group] now lets combine the info of grouprisk and the year to find which one that
%has default
DefRateByScoreGroupYOB = groupsummary(data,{'ScoreGroup','YOB'},'mean','Default');
NumScoreGroupYOB = height(DefRateByScoreGroupYOB);
disp(DefRateByScoreGroupYOB(1:10,:))
%[image] lets draw them
DefRateByScoreYOB2 = reshape(DefRateByScoreGroupYOB.mean_Default,...
NumYOB,NumScoreGroups);
plot(DefRateByScoreYOB2*100,'-*')
title('Default Rate vs. Years on Books')
xlabel('Years on Books')
ylabel('Observed Default Rate (%)')
legend(categories(data.ScoreGroup))
grid on
%7. How about companies that start the operation not in 1997
% Get IDs of 1997, 1998, and 1999 cohorts
IDs1997 = data.ID(data.YOB==1&data.Year==1997);
IDs1998 = data.ID(data.YOB==1&data.Year==1998);
IDs1999 = data.ID(data.YOB==1&data.Year==1999);
% IDs2000AndUp is unused, it is only computed to show that this is empty,
% no loans started after 1999
IDs2000AndUp = data.ID(data.YOB==1&data.Year>1999);
% Get default rates for each cohort separately
ObsDefRate1997 = groupsummary(data(ismember(data.ID,IDs1997),:),...
'YOB','mean','Default');
ObsDefRate1998 = groupsummary(data(ismember(data.ID,IDs1998),:),...
'YOB','mean','Default');
ObsDefRate1999 = groupsummary(data(ismember(data.ID,IDs1999),:),...
'YOB','mean','Default');
% Plot against the years on books
plot(ObsDefRate1997.YOB,ObsDefRate1997.mean_Default*100,'-*')
hold on
plot(ObsDefRate1998.YOB,ObsDefRate1998.mean_Default*100,'-*')
plot(ObsDefRate1999.YOB,ObsDefRate1999.mean_Default*100,'-*')
hold off
title('Default Rate vs. Years on Books')
xlabel('Years on Books')
ylabel('Default Rate (%)')
legend('Cohort 97','Cohort 98','Cohort 99')
grid on
%8. Plot against the calendar year
% Plot against the calendar year
Year = unique(data.Year);
plot(Year,ObsDefRate1997.mean_Default*100,'-*')
hold on
plot(Year(2:end),ObsDefRate1998.mean_Default*100,'-*')
plot(Year(3:end),ObsDefRate1999.mean_Default*100,'-*')
hold off
title('Default Rate vs. Calendar Year')
xlabel('Calendar Year')
ylabel('Default Rate (%)')
legend('Cohort 97','Cohort 98','Cohort 99')
grid on
%9 model of default rate using score group and YOB
NumTraining = floor(0.6*nIDs);
rng('default'); % for reproducibility
TrainIDInd = randsample(nIDs,NumTraining);
TrainDataInd = ismember(data.ID,UniqueIDs(TrainIDInd));
TestDataInd = ~TrainDataInd;
%10 model of no macro
ModelNoMacro = fitLifetimePDModel(data(TrainDataInd,:),'logistic',...
'ModelID','No Macro','Description','Logistic model with YOB and score group, but no macro variables',...
'IDVar','ID','LoanVars','ScoreGroup','AgeVar','YOB','ResponseVar','Default');
disp(ModelNoMacro.Model)
%11. PD no macro
data.PDNoMacro = zeros(height(data),1);
% Predict in-sample
data.PDNoMacro(TrainDataInd) = predict(ModelNoMacro,data(TrainDataInd,:));
% Predict out-of-sample
data.PDNoMacro(TestDataInd) = predict(ModelNoMacro,data(TestDataInd,:));
% use the predict lifetime
data1 = data(1:16,:);
data1.LifetimePD = predictLifetime(ModelNoMacro,data1);
disp(data1)
%visualize them
DataSetChoice = "Test";
if DataSetChoice=="Training"
Ind = TrainDataInd;
else
Ind = TestDataInd;
end
modelAccuracyPlot(ModelNoMacro,data(Ind,:),'YOB','DataID',DataSetChoice)
%The score group can be input as a second grouping variable to visualize the fit by score groups.
modelAccuracyPlot(ModelNoMacro,data(Ind,:),{'YOB' 'ScoreGroup'},'DataID',DataSetChoice)
%Lifetime PD models also support validation tools for model discrimination. In particular, the modelDiscriminationPlot function creates the receiver operating characteristic (ROC) curve plot. Here a separate ROC curve is requested for each score group. For more information, see modelDiscriminationPlot.
modelDiscriminationPlot(ModelNoMacro,data(Ind,:),'SegmentBy','ScoreGroup','DataID',DataSetChoice)
%12. Model of Default Rates Including Macroeconomic Variables
% 13. First join the data
data = join(data,dataMacro);
disp(data(1:10,:))
%14. Fit the model with the macroeconomic variables, or macro model, by expanding the model formula to include the GDP and the Market variables.
ModelMacro = fitLifetimePDModel(data(TrainDataInd,:),'logistic',...
'ModelID','Macro','Description','Logistic model with YOB, score group and macro variables',...
'IDVar','ID','LoanVars','ScoreGroup','AgeVar','YOB',...
'MacroVars',{'GDP','Market'},'ResponseVar','Default');
disp(ModelMacro.Model)
%15. Use the predict function to predict the conditional PD. For illustration, here is how to predict the conditional PD on training and testing data using the macro model. The results are stored as a new column in the data table. Lifetime PD prediction is also supported with the predictLifetime function, as shown in the Model of Default Rates Using Score Group and Years on Books section.
data.PDMacro = zeros(height(data),1);
% Predict in-sample
data.PDMacro(TrainDataInd) = predict(ModelMacro,data(TrainDataInd,:));
% Predict out-of-sample
data.PDMacro(TestDataInd) = predict(ModelMacro,data(TestDataInd,:));
%16. Visualize them
DataSetChoice = "Test";
if DataSetChoice=="Training"
Ind = TrainDataInd;
else
Ind = TestDataInd;
end
modelAccuracyPlot(ModelMacro,data(Ind,:),'YOB','ReferencePD',data.PDNoMacro(Ind),'ReferenceID',ModelNoMacro.ModelID,'DataID',DataSetChoice)
%17 With accuracy
modelAccuracyPlot(ModelMacro,data(Ind,:),{'YOB','ScoreGroup'},'ReferencePD',data.PDNoMacro(Ind),'ReferenceID',ModelNoMacro.ModelID,'DataID',DataSetChoice)
%18 Plot the curve
modelDiscriminationPlot(ModelMacro,data(Ind,:),'ReferencePD',data.PDNoMacro(Ind),'ReferenceID',ModelNoMacro.ModelID,'DataID',DataSetChoice)
% 19. Stress testing with probability default
% What regulator give
disp(dataMacroStress)
%20. Set the base line
dataBaseline = table;
[ScoreGroup,YOB]=meshgrid(1:NumScoreGroups,1:NumYOB);
dataBaseline.ScoreGroup = categorical(ScoreGroup(:),1:NumScoreGroups,...
categories(data.ScoreGroup),'Ordinal',true);
dataBaseline.YOB = YOB(:);
dataBaseline.ID = ones(height(dataBaseline),1);
dataBaseline.GDP = zeros(height(dataBaseline),1);
dataBaseline.Market = zeros(height(dataBaseline),1);
%21. To make the predictions, set the same macroeconomic conditions (baseline, adverse, or severely adverse) for all combinations of score groups and number of years on books.
% Predict baseline the probabilities of default
dataBaseline.GDP(:) = dataMacroStress.GDP('Baseline');
dataBaseline.Market(:) = dataMacroStress.Market('Baseline');
dataBaseline.PD = predict(ModelMacro,dataBaseline);
% Predict the probabilities of default in the adverse scenario
dataAdverse = dataBaseline;
dataAdverse.GDP(:) = dataMacroStress.GDP('Adverse');
dataAdverse.Market(:) = dataMacroStress.Market('Adverse');
dataAdverse.PD = predict(ModelMacro,dataAdverse);
% Predict the probabilities of default in the severely adverse scenario
dataSevere = dataBaseline;
dataSevere.GDP(:) = dataMacroStress.GDP('Severe');
dataSevere.Market(:) = dataMacroStress.Market('Severe');
dataSevere.PD = predict(ModelMacro,dataSevere);
%21. Visualize
PredPDYOB = zeros(NumYOB,3);
PredPDYOB(:,1) = mean(reshape(dataBaseline.PD,NumYOB,NumScoreGroups),2);
PredPDYOB(:,2) = mean(reshape(dataAdverse.PD,NumYOB,NumScoreGroups),2);
PredPDYOB(:,3) = mean(reshape(dataSevere.PD,NumYOB,NumScoreGroups),2);
figure;
bar(PredPDYOB*100);
xlabel('Years on Books')
ylabel('Predicted Default Rate (%)')
legend('Baseline','Adverse','Severe')
title('Stress Test, Probability of Default')
grid on
Hi, I want to raise the issue related to know whether your OLS is ok or not.
read moreThe **45-degree line** in economics and geometry refers to a line where the values on the x-axis and y-axis are equal at every point. It typically has a slope of 1, meaning that for every unit increase along the horizontal axis (x), there is an equal unit increase along the vertical axis (y). Here are a couple of contexts where the 45-degree line is significant:
read moreThe **hyperinflation in Hungary** in the aftermath of World War II (1945–1946) is considered the worst case of hyperinflation in recorded history. The reasons behind this extreme economic event are numerous, involving a combination of war-related devastation, political instability, massive fiscal imbalances, and mismanagement of monetary policy. Here's an in-depth look at the primary causes:
read more**Neutrality of money** is a concept in economics that suggests changes in the **money supply** only affect **nominal variables** (like prices, wages, and exchange rates) and have **no effect on real variables** (like real GDP, employment, or real consumption) in the **long run**.
read moreDeflation in Japan, which has persisted over several decades since the early 1990s, is a complex economic phenomenon. It has been influenced by a combination of structural, demographic, monetary, and fiscal factors. Here are the key reasons why deflation occurred and persisted in Japan:
read moreHedging against inflation involves taking financial or investment actions designed to protect the purchasing power of money in the face of rising prices. Inflation erodes the value of currency over time, so investors seek assets or strategies that tend to increase in value or generate returns that outpace inflation. Below are several ways to hedge against inflation:
read moreThe **Phillips Curve** illustrates the relationship between inflation and unemployment, and how this relationship differs in the **short run** and the **long run**. Over time, economists have modified the original Phillips Curve framework to reflect more nuanced understandings of inflation and unemployment dynamics.
read moreDealing with inflation requires a combination of **fiscal and monetary policy** tools. Policymakers adjust these tools depending on the nature of inflation—whether it's **demand-pull** (inflation caused by excessive demand in the economy) or **cost-push** (inflation caused by rising production costs). Below are key approaches to controlling inflation through fiscal and monetary policy.
read moreCollaboratively administrate empowered markets via plug-and-play networks. Dynamically procrastinate B2C users after installed base benefits. Dramatically visualize customer directed convergence without
Comments