How do the t-distribution and standard normal distribution differ, and why is t-distribution used more?

For statistical inference (e.g., hypothesis testing or computing confidence intervals), why do we use the t-distribution instead of the standard normal distribution? My class started with the standard normal distribution and shifted to the t-distribution, and I am not fully sure why. Is it because t-distributions can a) deal with small sample sizes (because it gives more emphasis to the tails) or b) be more robust to a non-normally distributed sample?

Possibly related: stats.stackexchange.com/questions/285649/…
– Henry
Aug 22 at 23:09

Searcher like stats.stackexchange.com/search?q=t-distribution+normal and stats.stackexchange.com/search?q=t-test+normal will include a number of relevant posts (and a lot of other hits so you may need to add further keywords to reduce the clutter).
– Glen_b♦
Aug 23 at 2:18

2 Answers
2

The normal distribution (which is almost certainly returning in later chapters of your course) is much easier to motivate than the t distribution for students new to the material. The reason why you are learning about the t distribution is more or less for your first reason: the t distribution takes a single parameter—sample size minus one—and more correctly accounts for uncertainty due to (small) sample size than the normal distribution when making inferences about a sample mean of normally-distributed data, assuming that the true variance is unknown.

With increasing sample size, both t and standard normal distributions are both approximately as robust with respect to deviations from normality (as sample size increases the t distribution converges to the standard normal distribution). Nonparametric tests (which I start teaching about half way through my intro stats course) are generally much more robust to non-normality than either t or normal distributions.

Finally, you are likely going to learn tests and confidence intervals for many different distributions by the end of your course (F, $chi^2$, rank distributions—at least in their table p-values, for example).

Thank you so much for this awesome response. I now get that t-distributions can better account for small sample sizes. However, if the sample size is large (> 30), it doesn't matter whether we use a t or standard normal distribution, right?
– Jane Sully
Aug 22 at 19:26

they become very similar when the degrees of freedom rise.
– Bernhard
Aug 22 at 19:37

@JaneSully Sure, but, for inference about means of normal data, it is never wrong to use the t distribution.
– Alexis
Aug 22 at 21:13

(Also, when/if you like an answer enough to say that it has answered your question, you can "accept" it by clicking on the check mark to the top left of the question. :).
– Alexis
Aug 22 at 21:24

I disagree with this statement: "the t distribution takes a single parameter—sample size minus one—and more correctly accounts for uncertainty due to (small) sample size than the normal distribution when making inferences about a sample mean of normally-distributed data." E.g. see this lecture: onlinecourses.science.psu.edu/stat414/node/173 There's no need for t-distribution on Gaussian data when standard deviation is known. The key here is whether you do or do not know the variance, not the n-1 adjustment
– Aksakal
Aug 23 at 3:49

The reason t-distribution is used in inference instead of normal is due to the fact that the theoretical distribution of some estimators is normal (Gaussian) only when the standard deviation is known, and when it is unknown the theoretical distribution is Student t.

We rarely know the standard deviation. Usually, we estimate from the sample, so for many estimators it is theoretically more solid to use Student t distribution and not normal.

Some estimators are consistent, i.e. in layman terms, they get better when the sample size increases. Student t becomes normal when sample size is large.

Example: sample mean

Consider a mean $mu$ of the sample $x_1,x_2,dots,x_n$. We can estimate it using a usual average estimator: $bar x=frac 1 nsum_i=1^nx_i$, which you may call a sample mean.

If we want to make inference statements about the mean, such as whether a true mean $mu<0$, we can use the sample mean $bar x$ but we need to know what is its distribution. It turns out that if we knew the standard deviation $sigma$ of $x_i$, then the sample mean would be distributed around the true mean according to Gaussian: $bar xsimmathcal N(mu,sigma^2/n)$, for large enough $n$

The problem's that we rarely know $sigma$, but we can estimate its value from the sample $hatsigma$ using one of the estimators. In this case the distribution of the sample mean is no longer Gaussian, but closer to Student t distribution.

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Dfyjkt

How do the t-distribution and standard normal distribution differ, and why is t-distribution used more?

How do the t-distribution and standard normal distribution differ, and why is t-distribution used more?

2 Answers
2

Example: sample mean

Popular posts from this blog

Node.js puppeteer - Use values from array in a loop to cycle through pages

How do the t-distribution and standard normal distribution differ, and why is t-distribution used more?

How do the t-distribution and standard normal distribution differ, and why is t-distribution used more?

2 Answers 2

Example: sample mean

Popular posts from this blog

Node.js puppeteer - Use values from array in a loop to cycle through pages

2 Answers
2