what is normality in the regression, some sample from stata

424 Shares

Posted by: admin 1 year, 9 months ago

What is normality in regression?

Normality means that the residual in regression is distributed normally, meaning that there is a huge amount of data in the middle and a couple of tails on both the left and right.

Many researchers believe that multiple regression requires normality. This is not the case. Normality of residuals is only required for valid hypothesis testing; that is, the normality assumption assures that the p-values for the t-tests and F-test will be valid. Normality is not required to obtain unbiased estimates of the regression coefficients. OLS regression merely requires that the residuals (errors) be identically and independently distributed.

Furthermore, there is no assumption or requirement that the predictor variables be normally distributed. If this were the case, then we would not be able to use dummy-coded variables in our models.

After we run a regression analysis, we can use the predict command to create residuals and then use commands such as kdensity, qnorm and pnorm to check the normality of the residuals.

Let’s use the elemapi2 data file we saw in Chapter 1 for these analyses. Let’s predict academic performance (api00) from the percent receiving free meals (meals), percent of English language learners (ell), and percent of teachers with emergency credentials (emer).

use https://stats.idre.ucla.edu/stat/stata/webbooks/reg/elemapi2
regress api00 meals ell emer

  Source |       SS       df       MS                  Number of obs =     400
---------+------------------------------               F(  3,   396) =  673.00
   Model |  6749782.75     3  2249927.58               Prob > F      =  0.0000
Residual |  1323889.25   396  3343.15467               R-squared     =  0.8360
---------+------------------------------               Adj R-squared =  0.8348
   Total |  8073672.00   399  20234.7669               Root MSE      =   57.82

------------------------------------------------------------------------------
   api00 |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
   meals |  -3.159189   .1497371    -21.098   0.000      -3.453568   -2.864809
     ell |  -.9098732   .1846442     -4.928   0.000      -1.272878   -.5468678
    emer |  -1.573496    .293112     -5.368   0.000      -2.149746   -.9972456
   _cons |   886.7033    6.25976    141.651   0.000       874.3967    899.0098
------------------------------------------------------------------------------

We then use the predict command to generate residuals.

predict r, resid

Below we use the kdensity command to produce a kernel density plot with the normal option requesting that a normal density be overlaid on the plot. kdensity stands for kernel density estimate. It can be considered a histogram with narrow bins and a moving average.

kdensity r, normal

The pnorm command graphs a standardized normal probability (P-P) plot while qnorm plots the quantiles of a variable against the quantiles of a normal distribution. pnorm is sensitive to non-normality in the middle range of data and qnorm is sensitive to non-normality near the tails. As you see below, the results from pnorm show no indications of non-normality, while the qnorm command shows a slight deviation from normal at the upper tail, as can be seen in the kdensity above. Nevertheless, this seems to be a minor and trivial deviation from normality. We can accept that the residuals are close to a normal distribution.

pnorm r
  

  
qnorm r

There are also numerical tests for testing normality. One of the tests is written by Lawrence C. Hamilton, Dept. of Sociology, Univ. of New Hampshire, called iqr. You can get this program from Stata by typing search iqr (see How I can used the search command to search for programs and get additional help? for more information about using search).

iqr stands for inter-quartile range and assumes the symmetry of the distribution. Severe outliers consist of those points that are either 3 inter-quartile ranges below the first quartile or 3 inter-quartile ranges above the third quartile. Any severe outliers should be sufficient evidence to reject normality at a 5% significance level. Mild outliers are common in samples of any size. In our case, we don’t have any severe outliers, and the distribution seems pretty symmetric. The residuals have an approximately normal distribution.

iqr r

   mean=  7.4e-08         std.dev.=   57.6          (n= 400)
 median= -3.657    pseudo std.dev.=  56.69        (IQR=  76.47)
10 trim= -1.083
                                               low         high
                                               -------------------
                                inner fences   -154.7       151.2
                           # mild outliers     1           5
                           % mild outliers     0.25%       1.25%

                                outer fences   -269.4       265.9
                           # severe outliers   0           0
                           % severe outliers   0.00%       0.00%

Another test available is the swilk test which performs the Shapiro-Wilk W test for normality. The p-value is based on the assumption that the distribution is normal. In our example, it is very large (.51), indicating that we cannot reject that r is normally distributed.

swilk r

                   Shapiro-Wilk W test for normal data
 Variable |    Obs           W         V          z   Pr > z
 ---------+-------------------------------------------------
        r |    400     0.99641     0.989     -0.025  0.51006

Comments

Tech Transport Mobile Gadgets

424 Shares 4 Comments

Latest posts

Fixing the issue in assumption of OLS step by step or one by one

Recent news

Hi, I want to raise the issue related to know whether your OLS is ok or not.

Meaning of 45 degree in economics chart

Recent news

The **45-degree line** in economics and geometry refers to a line where the values on the x-axis and y-axis are equal at every point. It typically has a slope of 1, meaning that for every unit increase along the horizontal axis (x), there is an equal unit increase along the vertical axis (y). Here are a couple of contexts where the 45-degree line is significant:

hyperinflation in hungary

Recent news

The **hyperinflation in Hungary** in the aftermath of World War II (1945–1946) is considered the worst case of hyperinflation in recorded history. The reasons behind this extreme economic event are numerous, involving a combination of war-related devastation, political instability, massive fiscal imbalances, and mismanagement of monetary policy. Here's an in-depth look at the primary causes:

what is neutrailty of money

Recent news

**Neutrality of money** is a concept in economics that suggests changes in the **money supply** only affect **nominal variables** (like prices, wages, and exchange rates) and have **no effect on real variables** (like real GDP, employment, or real consumption) in the **long run**.

Japan deflationary phenomenon

Recent news

Deflation in Japan, which has persisted over several decades since the early 1990s, is a complex economic phenomenon. It has been influenced by a combination of structural, demographic, monetary, and fiscal factors. Here are the key reasons why deflation occurred and persisted in Japan:

What the tips against inflation

Recent news

Hedging against inflation involves taking financial or investment actions designed to protect the purchasing power of money in the face of rising prices. Inflation erodes the value of currency over time, so investors seek assets or strategies that tend to increase in value or generate returns that outpace inflation. Below are several ways to hedge against inflation:

Long and short run philip curve

Recent news

The **Phillips Curve** illustrates the relationship between inflation and unemployment, and how this relationship differs in the **short run** and the **long run**. Over time, economists have modified the original Phillips Curve framework to reflect more nuanced understandings of inflation and unemployment dynamics.

How the government deal with inflation (monetary and fiscal) policies

Recent news

Dealing with inflation requires a combination of **fiscal and monetary policy** tools. Policymakers adjust these tools depending on the nature of inflation—whether it's **demand-pull** (inflation caused by excessive demand in the economy) or **cost-push** (inflation caused by rising production costs). Below are key approaches to controlling inflation through fiscal and monetary policy.

2 months ago

More News »

what is normality in the regression, some sample from stata

Posted by: admin 1 year, 9 months ago

Comments

Riddles

Knows deeper about Cyclical Risk

Latest posts

Fixing the issue in assumption of OLS step by step or one by one

2 weeks, 6 days ago

Meaning of 45 degree in economics chart

1 month, 3 weeks ago

hyperinflation in hungary

2 months ago

what is neutrailty of money

2 months ago

Japan deflationary phenomenon

2 months ago

What the tips against inflation

2 months ago

Long and short run philip curve

2 months ago

How the government deal with inflation (monetary and fiscal) policies

2 months ago

Latest comments

Editor Corner