PhD Chapter - All things related to regression in Matlab

(Comments)

Today I will write here all things related to regression in Matlab

1. Simple regression 

2. Multiple regression

3. Polynomial regression

4. Logistic regression

5. What to do with missing data

1. Simple regression 

What we will learn from this code

  1. Everything will be simple when we use a picture to describe 
  2. See how the line of regression
  3. See how far the dot with the line 
% 
% Acknowledgement: Mike X Cohen, sincxpress.com
%


% a clear MATLAB workspace is a clear mental workspace
close all; clear, clc


%% example: effects of sleep on food spending

data = [
 5 47
 5.5 53
 6 52
 6 44
 7 39
 7 49
 7.5 50
 8 38
 8.5 43
 9 40
];


% start by showing the data
% hold on meaning you still want to run another operation in the picture 
figure(1), clf, hold on
% the x is in column 1 and y column 2
plot(data(:,1),data(:,2),'ko','markerfacecolor','k','markersize',10)
xlabel('Hours of sleep'), ylabel('Fijian dollars')
% gte current axes on specific limit of x aand y, so it makes more cleear
% in the zoom of picture
set(gca,'xlim',[4 10],'ylim',[36 54])

%% "manual" regression via least-squares fitting

% create the design matrix
% desmat is design matrix
% cat is concatenate array 
% category 1 is the concatenate row become double (two times)
% example matrix 3x3 and 3x3, category 1 will become matrix 6 x 3 
% category 2 is the concatenate column become two times 
% here (10 x 1) and (10 x 1) will become 10 x 2
desmat = cat(2,ones(10,1),data(:,1));

% compute the beta parameters (regression coefficients)
beta = desmat\data(:,2);

% predicted data values
yHat = desmat*beta;

%% show the predicted results on top of the "real" data

% predicted results
plot(data(:,1),yHat,'s-','color','b','markersize',10,'markerfacecolor','b')

% show the residuals
for i=1:10
 plot([1 1]*data(i,1),[data(i,2) yHat(i)],'m--')
end

h = legend({'Data (y)';'Model (\^{y})';'Residuals'},'Interpreter','latex');
set(h,'box','off')

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% now with the stats toolbox
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

[beta2,b_CI,resids,rint,stats] = regress(data(:,2),desmat);

% stats vector consist of  R2, F, p-val, error variance
stats

% compare the betas
[ beta beta2 ]

%% done.

From this

%1. First is clean all the data 
% a clear MATLAB workspace is a clear mental workspace
close all; clear; clc

%2. Import with xls read
[num,txt,raw]= xlsread("SingaporeGDP.xlsx")


%changetheyearto num

%3. check sizes and outputs
whos

%4. Make the matrix from the data
% initialize matrices
%M is the matrix where there is 51 row with 23 column
GDP = num((2:43),3);
RealInterest = num((2:43),4);
YearM = data((2:43),2);
YearM = str2double(YearM);

%create design matrix
desmat = cat(2,ones(42,1),RealInterest(:,:));

%and regress it
[beta2,b_CI,resids,rint,stats] = regress(GDP,desmat);

2. Multiple regression

A couple of things need to be considered before we run the model

  1. The interpretation of \( \beta \) is the 1 increase in beta, will affect the dependent variable into what is the constant in beta
  2. Sometimes beta is not in the standardized, for example, the situation where in time there is an hour, minutes and seconds, how it can be connected with calories. So the way to do it is by using standard deviation.  \( b_k = \beta_k \frac{s_{x_k}}{s_y}\)
%%
% VIDEO: Multiple regression
%
% Acknowledgement: Mike X Cohen, sincxpress.com
%

%%

% a clear MATLAB workspace is a clear mental workspace
close all; clear, clc


%% example: effects of sleep and study hours on exam scores
%%% create the data

exam_scores = [];
for ei=0:4
exam_scores = [ exam_scores 60*ones(1,6)+linspace(-1,5,6)*ei ];
end

exam_scores = exam_scores'; % force column vector
% repmat is create matrix with a value of x in matrix y * z, repmat(x,y,z)
% linspace is create a group of number between 2 until 8 with 6 number
hours_studied = repmat(linspace(2,8,6),1,5)';

% thee transpose of value between 6 to 10 with 30 number
ave_sleep_hrs = linspace(6,10,30)';

%% plot the data

% stratify by hours studied
figure(1), clf, hold on

% fewer than 4 hours studied
plotidx = hours_studied<4.1;
plot(ave_sleep_hrs(plotidx),exam_scores(plotidx),'ko','markerfacecolor','k','markersize',10)

% 5-6 hours studied
plotidx = hours_studied>4.9 & hours_studied<6.1;
plot(ave_sleep_hrs(plotidx),exam_scores(plotidx),'rs','markerfacecolor','r','markersize',10)

% more than 6 hours
plotidx = hours_studied>6;
plot(ave_sleep_hrs(plotidx),exam_scores(plotidx),'b^','markerfacecolor','b','markersize',10)

xlabel('Hours of sleep'), ylabel('Exam score')
legend({'<4 hours studied';'5-6 hours studied';'>7 hours studied'})

%% compute the multiple regression

% first create the design matrix
desmat = [ ones(30,1) ave_sleep_hrs hours_studied ave_sleep_hrs.*hours_studied ];

[beta,b_CI,resids,rint,stats] = regress(exam_scores,desmat);

% stats vector is R2, F, p-val, error variance
stats

%% inspect the residuals

figure(2), clf
plot(exam_scores,resids,'ks','markerfacecolor','k')
xlabel('Exam scores')
ylabel('Model residuals')

%% correlation of IVs

corr(desmat(:,2:end))

%% done.

And the next one is if we use every different mechanism for regression 


%
% TEACHER: Mike X Cohen, sincxpress.com
%

%%

% a clear MATLAB workspace is a clear mental workspace
close all; clear, clc

%% example: effects of sleep and study hours on exam scores
%%% create the data

exam_scores = [];
for ei=0:4
exam_scores = [ exam_scores 60*ones(1,6)+linspace(-1,5,6)*ei ];
end

exam_scores = exam_scores'; % force column vector
hours_studied = repmat(linspace(2,8,6),1,5)';
ave_sleep_hrs = linspace(6,10,30)';

%% plot the data

% stratify by hours studied
figure(1), clf, hold on

% fewer than 4 hours studied
plotidx = hours_studied<4.1;
plot(ave_sleep_hrs(plotidx),exam_scores(plotidx),'ko','markerfacecolor','k','markersize',10)

% 5-6 hours studied
plotidx = hours_studied>4.9 & hours_studied<6.1;
plot(ave_sleep_hrs(plotidx),exam_scores(plotidx),'rs','markerfacecolor','r','markersize',10)

% more than 6 hours
plotidx = hours_studied>6;
plot(ave_sleep_hrs(plotidx),exam_scores(plotidx),'b^','markerfacecolor','b','markersize',10)

xlabel('Hours of sleep'), ylabel('Exam score')
legend({'<4 hours studied';'5-6 hours studied';'>7 hours studied'})

%% compute the multiple regression

% first create the design matrix
desmat = [ ones(30,1) ave_sleep_hrs hours_studied ave_sleep_hrs.*hours_studied ];

[beta,b_CI,resids,rint,stats] = regress(exam_scores,desmat);

% stats vector is R2, F, p-val, error variance
stats

%% inspect the residuals

figure(2), clf
plot(exam_scores,resids,'ks','markerfacecolor','k')
xlabel('Exam scores')
ylabel('Model residuals')

%% using fitlm

% with explicit intercept
lm1 = fitlm(desmat,exam_scores,'VarNames',{'Intercept','Ave sleep','Study hours','Interaction','Exam scores'})

% without intercept
% lm2 = fitlm(desmat(:,2:end),exam_scores,'VarNames',{'Ave sleep','Studyhours','Interaction','Exam scores'});

% without interaction term
lm3 = fitlm(desmat(:,2:3),exam_scores,'VarNames',{'Ave sleep','Study hours','Exam scores'});

% specify the model
lm4 = fitlm(desmat(:,2:3),exam_scores,'exam ~ sleep*study',...
'VarNames',{'sleep','study','exam'});

%% correlation of IVs

corr(desmat(:,2:end))

%% done.

3. Polynomial regression

%%
%   COURSE: Master statistics and machine learning: intuition, math, code										
%      URL: udemy.com/course/statsml_x/?couponCode=202006 
% 
%  SECTION: Regression
%    VIDEO: Polynomial regression
% 
%  TEACHER: Mike X Cohen, sincxpress.com
%

%%

% a clear MATLAB workspace is a clear mental workspace
close all; clear; clc
pkg load statistics % load Octave stats package

%% generate the data

n  = 30;
x  = linspace(-2,4,n);
y1 = x.^2 + randn(1,n);
y2 = x.^3 + randn(1,n);

%%% plot the data
figure(1), clf, hold on
plot(x,y1,'ko','markersize',10,'markerfacecolor','r','linewidth',2)
plot(x,y2,'ko','markersize',10,'markerfacecolor','g','linewidth',2)
legend({'Quadratic','Cubic'})

%% now for a polynomial fit

% for y1
pterms = polyfit(x,y1,2);
yHat1 = polyval(pterms,x);
plot(x,yHat1,'r','linewidth',2)

% for y2
pterms = polyfit(x,y2,3);
yHat2 = polyval(pterms,x);
plot(x,yHat2,'g','linewidth',2)

%% compute R2

% compute R2 for several polynomial orders
orders = 1:5;

% output matrices
r2 = zeros(2,length(orders));
sse = zeros(2,length(orders));

% the loop!
for oi=1:length(orders)
    
    % fit the model with oi terms
    pterms = polyfit(x,y1,orders(oi));
    yHat = polyval(pterms,x);
    
    % compute R2
    ss_eta = sum((y1-yHat).^2); % numerator
    ss_tot = sum((y1-mean(y1)).^2); % denominator
    r2(1,oi) = 1 - ss_eta/ss_tot; % R^2
    sse(1,oi) = ss_eta; % store just the SSe for model comparison later
    
    
    %%% repeat for y2
    pterms = polyfit(x,y2,orders(oi));
    yHat = polyval(pterms,x);
    ss_eta = sum((y2-yHat).^2);
    ss_tot = var(y2)*(n-1);
    r2(2,oi) = 1 - ss_eta/ss_tot;
    sse(2,oi) = ss_eta; % store just the SSe for model comparison later
end

% plot the results
figure(2), clf
subplot(211)
plot(orders,r2,'s-','linewidth',2,'markerfacecolor','w','markersize',12)
set(gca,'xlim',[orders(1)-.5 orders(end)+.5],'xtick',orders)
xlabel('Polynomial model order')
ylabel('Model R^2')
legend({'y_1 = x^2';'y_2 = x^3'})


% compute the Bayes Information Criterion
bic = n*log(sse) + orders*log(n);
subplot(212)
plot(orders,bic,'s-','linewidth',2,'markerfacecolor','w','markersize',12)
set(gca,'xlim',[orders(1)-.5 orders(end)+.5],'xtick',orders)
xlabel('Polynomial model order')
ylabel('BIC')
zoom on

%% done.

4. Logistic regression

%%
%   COURSE: Master statistics and machine learning: intuition, math, code										
%      URL: udemy.com/course/statsml_x/?couponCode=202006 
% 
%  SECTION: Regression
%    VIDEO: Logistic regression
% 
%  TEACHER: Mike X Cohen, sincxpress.com
%

%%

% a clear MATLAB workspace is a clear mental workspace
close all; clear; clc
pkg load statistics % load Octave stats package

%% generate the data

% columns are: exam (pass/fail), study hours, sleep hours
data = [
    0	7.9	4.4
    0	7.9	5.2
    0	2.8	7.5
    0	5.4	4.6
    0	6.1	5.5
    0	4.5	6.1
    0	6.9	6.6
    0	2.3	3.1
    0	1.9	5.9
    0	1	3.2
    1	3.1	7.5
    1	5.7	7.8
    1	5.6	6.1
    1	4.7	5.4
    1	4.2	10.5
    1	2	8.2
    1	7.7	7.2
    1	6.5	7.2
    1	5.1	5.9
    1	3.7	7.9 ];

% show the data
figure(1), clf, hold on
plot([data(:,1)-.05 data(:,1)+.05]',data(:,2:3)','k','HandleVisibility','off')
plot(data(:,1)-.05,data(:,2),'ks','markerfacecolor',[255 204 255]/255,'markersize',15)
plot(data(:,1)+.05,data(:,3),'ks','markerfacecolor',[100 255 255]/255,'markersize',15)

set(gca,'xtick',[0 1],'xlim',[-.5 1.5],'ylim',[0 11],'xticklabel',{'Fail';'Pass'})
ylabel('Hours sleep or study')
legend({'Study';'Sleep'})

%% now run the logistic regression

[intercept,betas,~,~,~,predvals] = logistic_regression(data(:,1),data(:,2:3),1);

%% compute the model prediction accuracy
%%% IMPORTANT NOTE: Octave outputs the predicted *accuracy for each label*, not the prediction for label=1

% convert to accuracy
aveacc = mean(predvals>.5);


%% plotting

figure(2), clf, hold on

plot(1:10,1-predvals(1:10),'ks','markerfacecolor','b','markersize',10)
plot(11:20,predvals(11:20),'ks','markerfacecolor','b','markersize',10,'HandleVisibility','off')
plot(get(gca,'xlim'),[1 1]*.5,'--','color','b','linewidth',2)
plot([1 1]*10.5,get(gca,'ylim'),'--','color','b','linewidth',2)

xlabel('Individual'), ylabel('p(pass)')
title([ 'Overall accuracy: ' num2str(aveacc) ])
legend({'Prediction';'Cut-off'})

%% done.

5. What to do with missing data

Currently unrated

Comments

Riddles

22nd Jul- 2020, by: Editor in Chief
524 Shares 4 Comments
Generic placeholder image
20 Oct- 2019, by: Editor in Chief
524 Shares 4 Comments
Generic placeholder image
20Aug- 2019, by: Editor in Chief
524 Shares 4 Comments
10Aug- 2019, by: Editor in Chief
424 Shares 4 Comments
Generic placeholder image
10Aug- 2015, by: Editor in Chief
424 Shares 4 Comments

More News  »

How to create output gap with Python and Anaconda

Recent news
1 month, 2 weeks ago

Dignity wrapped in Charity

Recent news
3 months ago

A reflection of using kanban flow and being minimalist

Recent news

Today is the consecutive day I want to use and be consistent with the Kanban flow! It seems it's perfect to limit my parallel and easily distractedness. 

read more
3 months, 1 week ago

Morning issue with car and my kind of music

Recent news
3 months, 1 week ago

Podcast Bapak Dimas 2 - pindahan rumah

Recent news

Vlog kali ini adalah terkait pindahan rumah!

read more
3 months, 2 weeks ago

Podcast Bapak Dimas - Bapaknya Jozio dan Kaziu - ep 1

Recent news

Seperti yang saya cerita kan sebelumnya, berikut adalah catatan pribadi VLOG kita! Bapak Dimas

read more
3 months, 2 weeks ago

Happy new year 2024 and thank you 2023!

Recent news

As the new year starts, I want to revisit what has happened in 2023. 

read more
3 months, 2 weeks ago

Some notes about python and Zen of Python

Recent news

Explore Python syntax

Python is a flexible programming language used in a wide range of fields, including software development, machine learning, and data analysis. Python is one of the most popular programming languages for data professionals, so getting familiar with its fundamental syntax and semantics will be useful for your future career. In this reading, you will learn about Python’s syntax and semantics, as well as where to find resources to further your learning.

read more
4 months, 4 weeks ago

More News »

Generic placeholder image

Collaboratively administrate empowered markets via plug-and-play networks. Dynamically procrastinate B2C users after installed base benefits. Dramatically visualize customer directed convergence without