Hypothesis Testing with Z-Test: Significance Level and Rejection Region

Join over 2 million students who advanced their careers with 365 Data Science. Learn from instructors who have worked at Meta, Spotify, Google, IKEA, Netflix, and Coca-Cola and master Python, SQL, Excel, machine learning, data analysis, AI fundamentals, and more.

Iliya Valchanov

If you want to understand why hypothesis testing works, you should first have an idea about the significance level and the reject region . We assume you already know what a hypothesis is , so let’s jump right into the action.

What Is the Significance Level?

First, we must define the term significance level .

Normally, we aim to reject the null if it is false.

Significance level

However, as with any test, there is a small chance that we could get it wrong and reject a null hypothesis that is true.

Error, significance level

How Is the Significance Level Denoted?

The significance level is denoted by α and is the probability of rejecting the null hypothesis , if it is true.

α and is the probability of rejecting the null hypothesis, significance level

So, the probability of making this error.

Typical values for α are 0.01, 0.05 and 0.1. It is a value that we select based on the certainty we need. In most cases, the choice of α is determined by the context we are operating in, but 0.05 is the most commonly used value.

Most common, significance level

A Case in Point

Say, we need to test if a machine is working properly. We would expect the test to make little or no mistakes. As we want to be very precise, we should pick a low significance level such as 0.01.

The famous Coca Cola glass bottle is 12 ounces. If the machine pours 12.1 ounces, some of the liquid would be spilled, and the label would be damaged as well. So, in certain situations, we need to be as accurate as possible.

Significance level: Coca Cola example

Higher Degree of Error

However, if we are analyzing humans or companies, we would expect more random or at least uncertain behavior. Hence, a higher degree of error.

You expect more random behavior, significance level

For instance, if we want to predict how much Coca Cola its consumers drink on average, the difference between 12 ounces and 12.1 ounces will not be that crucial. So, we can choose a higher significance level like 0.05 or 0.1.

The difference between 12 and 12.1, significance level

Hypothesis Testing: Performing a Z-Test

Now that we have an idea about the significance level , let’s get to the mechanics of hypothesis testing.

Imagine you are consulting a university and want to carry out an analysis on how students are performing on average.

How students are performing on average, significance-level

The university dean believes that on average students have a GPA of 70%. Being the data-driven researcher that you are, you can’t simply agree with his opinion, so you start testing.

The null hypothesis is: The population mean grade is 70%.

This is a hypothesized value.

The alternative hypothesis is: The population mean grade is not 70%. You can see how both of them are denoted, below.

University Dean example: Null hypothesis equals the population mean

Visualizing the Grades

Assuming that the population of grades is normally distributed, all grades received by students should look in the following way.

Distribution of grades, significance level

That is the true population mean .

Performing a Z-test

Now, a test we would normally perform is the Z-test . The formula is:

Z equals the sample mean , minus the hypothesized mean , divided by the standard error .

Z equals the sample mean, minus the hypothesized mean, divided by the standard error, significance level

The idea is the following.

We are standardizing or scaling the sample mean we got. (You can quickly obtain it with our Mean, Median, Mode calculator .) If the sample mean is close enough to the hypothesized mean , then Z will be close to 0. Otherwise, it will be far away from it. Naturally, if the sample mean is exactly equal to the hypothesized mean , Z will be 0.

If the sample mean is exactly equal to the hypothesized mean, Z will be 0, significance level

In all these cases, we would accept the null hypothesis .

What Is the Rejection Region?

The question here is the following:

How big should Z be for us to reject the null hypothesis ?

Well, there is a cut-off line. Since we are conducting a two-sided or a two-tailed test, there are two cut-off lines, one on each side.

Distribution of Z (standard normal distribution), significance level

When we calculate Z , we will get a value. If this value falls into the middle part, then we cannot reject the null. If it falls outside, in the shaded region, then we reject the null hypothesis .

That is why the shaded part is called: rejection region , as you can see below.

Rejection region, significance level

What Does the Rejection Region Depend on?

The area that is cut-off actually depends on the significance level .

Say the level of significance , α , is 0.05. Then we have α divided by 2, or 0.025 on the left side and 0.025 on the right side.

The level of significance, α, is 0.05. Then we have α divided by 2, or 0.025 on the left side and 0.025 on the right side

Now these are values we can check from the z-table . When α is 0.025, Z is 1.96. So, 1.96 on the right side and minus 1.96 on the left side.

Therefore, if the value we get for Z from the test is lower than minus 1.96, or higher than 1.96, we will reject the null hypothesis . Otherwise, we will accept it.

One-sided test: Z score is 1.96

That’s more or less how hypothesis testing works.

We scale the sample mean with respect to the hypothesized value. If Z is close to 0, then we cannot reject the null. If it is far away from 0, then we reject the null hypothesis .

How does hypothesis testing work?

Example of One Tailed Test

What about one-sided tests? We have those too!

Let’s consider the following situation.

Paul says data scientists earn more than $125,000. So, H 0 is: μ 0 is bigger than $125,000.

The alternative is that μ 0 is lower or equal to 125,000.

Using the same significance level , this time, the whole rejection region is on the left. So, the rejection region has an area of α . Looking at the z-table, that corresponds to a Z -score of 1.645. Since it is on the left, it is with a minus sign.

One-sided test: Z score is 1.645

Accept or Reject

Now, when calculating our test statistic Z , if we get a value lower than -1.645, we would reject the null hypothesis . We do that because we have statistical evidence that the data scientist salary is less than $125,000. Otherwise, we would accept it.

One-sided test: Z score is - 1.645 - rejecting null hypothesis

Another One-Tailed Test

To exhaust all possibilities, let’s explore another one-tailed test.

Say the university dean told you that the average GPA students get is lower than 70%. In that case, the null hypothesis is:

μ 0 is lower than 70%.

While the alternative is:

μ 0` is bigger or equal to 70%.

University Dean example: Null hypothesis lower than the population mean

In this situation, the rejection region is on the right side. So, if the test statistic is bigger than the cut-off z-score, we would reject the null, otherwise, we wouldn’t.

One-sided test: test statistic is bigger than the cut-off z-score - reject the null hypothesis

Importance of the Significance Level and the Rejection Region

To sum up, the significance level and the reject region are quite crucial in the process of hypothesis testing. The level of significance conducts the accuracy of prediction. We (the researchers) choose it depending on how big of a difference a possible error could make. On the other hand, the reject region helps us decide whether or not to reject the null hypothesis . After reading this and putting both of them into use, you will realize how convenient they make your work.

Interested in taking your skills from good to great? Try statistics course for free !

Next Tutorial:  Providing a Few Linear Regression Examples

hypothesis testing rejection region

Iliya Valchanov

Co-founder of 365 Data Science

Iliya is a finance graduate with a strong quantitative background who chose the exciting path of a startup entrepreneur. He demonstrated a formidable affinity for numbers during his childhood, winning more than 90 national and international awards and competitions through the years. Iliya started teaching at university, helping other students learn statistics and econometrics. Inspired by his first happy students, he co-founded 365 Data Science to continue spreading knowledge. He authored several of the program’s online courses in mathematics, statistics, machine learning, and deep learning.

We Think you'll also like

Hypothesis Testing: Null Hypothesis and Alternative Hypothesis

Statistics Tutorials

Hypothesis Testing: Null Hypothesis and Alternative Hypothesis

Article by Iliya Valchanov

False Positive vs. False Negative: Type I and Type II Errors in Statistical Hypothesis Testing

Calculating and Using Covariance and Linear Correlation Coefficient

Calculating and Using Covariance and Linear Correlation Coefficient

Examples of Numerical and Categorical Variables

Examples of Numerical and Categorical Variables

Rejection Region (Critical Region) for Statistical Tests

Hypothesis Testing >

What is a Rejection Region?

rejection region

A rejection region (also called a critical region ) is an area of a graph where you would reject the null hypothesis if your test results fall into that area. In other words, if your results fall into that area then they are statistically significant .

The main purpose of statistics is to test theories or results from experiments. For example, you might have invented a new fertilizer that you think makes plants grow 50% faster. In order to prove your theory is true, your experiment must:

  • Be repeatable.
  • Be compared to a known fact about plants (in this example, probably the average growth rate of plants without the fertilizer).

We call this type of statistical testing a hypothesis test . The rejection region is a part of the testing process. Specifically, it is an area of probability that tells you if your theory (your “”hypothesis”) is probably true.

Two Tailed vs One Tailed

Which type of test is determined by your null hypothesis statement. For example, if your statement asks “Is the average growth rate greater than 10cm a day?” that’s a one tailed test, because you are only interested in one direction (greater than 10cm a day).

You could also have a single rejection region for “less than”. For example, “Is the growth rate less than 10cm a day?” A two tailed test , with two regions, would be used when you want to know if there’s a difference in both directions (greater than and less than).

Rejection Regions and Alpha Levels

You, as a researcher, choose the alpha level you are willing to accept. For example, if you wanted to be 95% confident that your results are significant , you would choose a 5% alpha level (100% – 95%). That 5% level is the rejection region . For a one tailed test , the 5% would be in one tail. For a two tailed test, the rejection region would be in two tails.

rejection region 2

Rejection Regions and P-Values.

There are two ways you can test a hypothesis: with a p-value and with a critical value .

P-value method : When you run a hypothesis test (for example, a z test ), the result of that test will be a p value . The p value is a “probability value.” It’s what tells you if your hypothesis statement is probably true or not. If the value falls in the rejection region, it means you have statistically significant results; You can reject the null hypothesis . If the p-value falls outside the rejection region, it means your results aren’t enough to throw out the null hypothesis. In the example of the plant fertilizer, a statistically significant result would be one that shows the fertilizer does indeed make plants grow faster (compared to other fertilizers).

Rejection Region method with a critical value : The steps are exactly the same. However, instead of calculating a p-value you calculate a critical value. If the value falls inside the region, you reject the null hypothesis.

Next : What is an Acceptance Region?

Check out our YouTube channel for more stats help and tips!

Beyer, W. H. CRC Standard Mathematical Tables, 31st ed. Boca Raton, FL: CRC Press, pp. 536 and 571, 2002. Dodge, Y. (2008). The Concise Encyclopedia of Statistics . Springer. Lindstrom, D. (2010). Schaum’s Easy Outline of Statistics , Second Edition (Schaum’s Easy Outlines) 2nd Edition. McGraw-Hill Education Wheelan, C. (2014). Naked Statistics . W. W. Norton & Company

Logo for University of Missouri System

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

7 Chapter 7: Introduction to Hypothesis Testing

alternative hypothesis

critical value

effect size

null hypothesis

probability value

rejection region

significance level

statistical power

statistical significance

test statistic

Type I error

Type II error

This chapter lays out the basic logic and process of hypothesis testing. We will perform z  tests, which use the z  score formula from Chapter 6 and data from a sample mean to make an inference about a population.

Logic and Purpose of Hypothesis Testing

A hypothesis is a prediction that is tested in a research study. The statistician R. A. Fisher explained the concept of hypothesis testing with a story of a lady tasting tea. Here we will present an example based on James Bond who insisted that martinis should be shaken rather than stirred. Let’s consider a hypothetical experiment to determine whether Mr. Bond can tell the difference between a shaken martini and a stirred martini. Suppose we gave Mr. Bond a series of 16 taste tests. In each test, we flipped a fair coin to determine whether to stir or shake the martini. Then we presented the martini to Mr. Bond and asked him to decide whether it was shaken or stirred. Let’s say Mr. Bond was correct on 13 of the 16 taste tests. Does this prove that Mr. Bond has at least some ability to tell whether the martini was shaken or stirred?

This result does not prove that he does; it could be he was just lucky and guessed right 13 out of 16 times. But how plausible is the explanation that he was just lucky? To assess its plausibility, we determine the probability that someone who was just guessing would be correct 13/16 times or more. This probability can be computed to be .0106. This is a pretty low probability, and therefore someone would have to be very lucky to be correct 13 or more times out of 16 if they were just guessing. So either Mr. Bond was very lucky, or he can tell whether the drink was shaken or stirred. The hypothesis that he was guessing is not proven false, but considerable doubt is cast on it. Therefore, there is strong evidence that Mr. Bond can tell whether a drink was shaken or stirred.

Let’s consider another example. The case study Physicians’ Reactions sought to determine whether physicians spend less time with obese patients. Physicians were sampled randomly and each was shown a chart of a patient complaining of a migraine headache. They were then asked to estimate how long they would spend with the patient. The charts were identical except that for half the charts, the patient was obese and for the other half, the patient was of average weight. The chart a particular physician viewed was determined randomly. Thirty-three physicians viewed charts of average-weight patients and 38 physicians viewed charts of obese patients.

The mean time physicians reported that they would spend with obese patients was 24.7 minutes as compared to a mean of 31.4 minutes for normal-weight patients. How might this difference between means have occurred? One possibility is that physicians were influenced by the weight of the patients. On the other hand, perhaps by chance, the physicians who viewed charts of the obese patients tend to see patients for less time than the other physicians. Random assignment of charts does not ensure that the groups will be equal in all respects other than the chart they viewed. In fact, it is certain the groups differed in many ways by chance. The two groups could not have exactly the same mean age (if measured precisely enough such as in days). Perhaps a physician’s age affects how long the physician sees patients. There are innumerable differences between the groups that could affect how long they view patients. With this in mind, is it plausible that these chance differences are responsible for the difference in times?

To assess the plausibility of the hypothesis that the difference in mean times is due to chance, we compute the probability of getting a difference as large or larger than the observed difference (31.4 − 24.7 = 6.7 minutes) if the difference were, in fact, due solely to chance. Using methods presented in later chapters, this probability can be computed to be .0057. Since this is such a low probability, we have confidence that the difference in times is due to the patient’s weight and is not due to chance.

The Probability Value

It is very important to understand precisely what the probability values mean. In the James Bond example, the computed probability of .0106 is the probability he would be correct on 13 or more taste tests (out of 16) if he were just guessing. It is easy to mistake this probability of .0106 as the probability he cannot tell the difference. This is not at all what it means.

The probability of .0106 is the probability of a certain outcome (13 or more out of 16) assuming a certain state of the world (James Bond was only guessing). It is not the probability that a state of the world is true. Although this might seem like a distinction without a difference, consider the following example. An animal trainer claims that a trained bird can determine whether or not numbers are evenly divisible by 7. In an experiment assessing this claim, the bird is given a series of 16 test trials. On each trial, a number is displayed on a screen and the bird pecks at one of two keys to indicate its choice. The numbers are chosen in such a way that the probability of any number being evenly divisible by 7 is .50. The bird is correct on 9/16 choices. We can compute that the probability of being correct nine or more times out of 16 if one is only guessing is .40. Since a bird who is only guessing would do this well 40% of the time, these data do not provide convincing evidence that the bird can tell the difference between the two types of numbers. As a scientist, you would be very skeptical that the bird had this ability. Would you conclude that there is a .40 probability that the bird can tell the difference? Certainly not! You would think the probability is much lower than .0001.

To reiterate, the probability value is the probability of an outcome (9/16 or better) and not the probability of a particular state of the world (the bird was only guessing). In statistics, it is conventional to refer to possible states of the world as hypotheses since they are hypothesized states of the world. Using this terminology, the probability value is the probability of an outcome given the hypothesis. It is not the probability of the hypothesis given the outcome.

This is not to say that we ignore the probability of the hypothesis. If the probability of the outcome given the hypothesis is sufficiently low, we have evidence that the hypothesis is false. However, we do not compute the probability that the hypothesis is false. In the James Bond example, the hypothesis is that he cannot tell the difference between shaken and stirred martinis. The probability value is low (.0106), thus providing evidence that he can tell the difference. However, we have not computed the probability that he can tell the difference.

The Null Hypothesis

The hypothesis that an apparent effect is due to chance is called the null hypothesis , written H 0 (“ H -naught”). In the Physicians’ Reactions example, the null hypothesis is that in the population of physicians, the mean time expected to be spent with obese patients is equal to the mean time expected to be spent with average-weight patients. This null hypothesis can be written as:

hypothesis testing rejection region

The null hypothesis in a correlational study of the relationship between high school grades and college grades would typically be that the population correlation is 0. This can be written as

hypothesis testing rejection region

Although the null hypothesis is usually that the value of a parameter is 0, there are occasions in which the null hypothesis is a value other than 0. For example, if we are working with mothers in the U.S. whose children are at risk of low birth weight, we can use 7.47 pounds, the average birth weight in the U.S., as our null value and test for differences against that.

For now, we will focus on testing a value of a single mean against what we expect from the population. Using birth weight as an example, our null hypothesis takes the form:

hypothesis testing rejection region

Keep in mind that the null hypothesis is typically the opposite of the researcher’s hypothesis. In the Physicians’ Reactions study, the researchers hypothesized that physicians would expect to spend less time with obese patients. The null hypothesis that the two types of patients are treated identically is put forward with the hope that it can be discredited and therefore rejected. If the null hypothesis were true, a difference as large as or larger than the sample difference of 6.7 minutes would be very unlikely to occur. Therefore, the researchers rejected the null hypothesis of no difference and concluded that in the population, physicians intend to spend less time with obese patients.

In general, the null hypothesis is the idea that nothing is going on: there is no effect of our treatment, no relationship between our variables, and no difference in our sample mean from what we expected about the population mean. This is always our baseline starting assumption, and it is what we seek to reject. If we are trying to treat depression, we want to find a difference in average symptoms between our treatment and control groups. If we are trying to predict job performance, we want to find a relationship between conscientiousness and evaluation scores. However, until we have evidence against it, we must use the null hypothesis as our starting point.

The Alternative Hypothesis

If the null hypothesis is rejected, then we will need some other explanation, which we call the alternative hypothesis, H A or H 1 . The alternative hypothesis is simply the reverse of the null hypothesis, and there are three options, depending on where we expect the difference to lie. Thus, our alternative hypothesis is the mathematical way of stating our research question. If we expect our obtained sample mean to be above or below the null hypothesis value, which we call a directional hypothesis, then our alternative hypothesis takes the form

hypothesis testing rejection region

based on the research question itself. We should only use a directional hypothesis if we have good reason, based on prior observations or research, to suspect a particular direction. When we do not know the direction, such as when we are entering a new area of research, we use a non-directional alternative:

hypothesis testing rejection region

We will set different criteria for rejecting the null hypothesis based on the directionality (greater than, less than, or not equal to) of the alternative. To understand why, we need to see where our criteria come from and how they relate to z  scores and distributions.

Critical Values, p Values, and Significance Level

alpha

The significance level is a threshold we set before collecting data in order to determine whether or not we should reject the null hypothesis. We set this value beforehand to avoid biasing ourselves by viewing our results and then determining what criteria we should use. If our data produce values that meet or exceed this threshold, then we have sufficient evidence to reject the null hypothesis; if not, we fail to reject the null (we never “accept” the null).

Figure 7.1. The rejection region for a one-tailed test. (“ Rejection Region for One-Tailed Test ” by Judy Schmitt is licensed under CC BY-NC-SA 4.0 .)

hypothesis testing rejection region

The rejection region is bounded by a specific z  value, as is any area under the curve. In hypothesis testing, the value corresponding to a specific rejection region is called the critical value , z crit  (“ z  crit”), or z * (hence the other name “critical region”). Finding the critical value works exactly the same as finding the z  score corresponding to any area under the curve as we did in Unit 1 . If we go to the normal table, we will find that the z  score corresponding to 5% of the area under the curve is equal to 1.645 ( z = 1.64 corresponds to .0505 and z = 1.65 corresponds to .0495, so .05 is exactly in between them) if we go to the right and −1.645 if we go to the left. The direction must be determined by your alternative hypothesis, and drawing and shading the distribution is helpful for keeping directionality straight.

Suppose, however, that we want to do a non-directional test. We need to put the critical region in both tails, but we don’t want to increase the overall size of the rejection region (for reasons we will see later). To do this, we simply split it in half so that an equal proportion of the area under the curve falls in each tail’s rejection region. For a = .05, this means 2.5% of the area is in each tail, which, based on the z  table, corresponds to critical values of z * = ±1.96. This is shown in Figure 7.2 .

Figure 7.2. Two-tailed rejection region. (“ Rejection Region for Two-Tailed Test ” by Judy Schmitt is licensed under CC BY-NC-SA 4.0 .)

hypothesis testing rejection region

Thus, any z  score falling outside ±1.96 (greater than 1.96 in absolute value) falls in the rejection region. When we use z  scores in this way, the obtained value of z (sometimes called z  obtained and abbreviated z obt ) is something known as a test statistic , which is simply an inferential statistic used to test a null hypothesis. The formula for our z  statistic has not changed:

hypothesis testing rejection region

Figure 7.3. Relationship between a , z obt , and p . (“ Relationship between alpha, z-obt, and p ” by Judy Schmitt is licensed under CC BY-NC-SA 4.0 .)

hypothesis testing rejection region

When the null hypothesis is rejected, the effect is said to have statistical significance , or be statistically significant. For example, in the Physicians’ Reactions case study, the probability value is .0057. Therefore, the effect of obesity is statistically significant and the null hypothesis that obesity makes no difference is rejected. It is important to keep in mind that statistical significance means only that the null hypothesis of exactly no effect is rejected; it does not mean that the effect is important, which is what “significant” usually means. When an effect is significant, you can have confidence the effect is not exactly zero. Finding that an effect is significant does not tell you about how large or important the effect is.

Do not confuse statistical significance with practical significance. A small effect can be highly significant if the sample size is large enough.

Why does the word “significant” in the phrase “statistically significant” mean something so different from other uses of the word? Interestingly, this is because the meaning of “significant” in everyday language has changed. It turns out that when the procedures for hypothesis testing were developed, something was “significant” if it signified something. Thus, finding that an effect is statistically significant signifies that the effect is real and not due to chance. Over the years, the meaning of “significant” changed, leading to the potential misinterpretation.

The Hypothesis Testing Process

A four-step procedure.

The process of testing hypotheses follows a simple four-step procedure. This process will be what we use for the remainder of the textbook and course, and although the hypothesis and statistics we use will change, this process will not.

Step 1: State the Hypotheses

Your hypotheses are the first thing you need to lay out. Otherwise, there is nothing to test! You have to state the null hypothesis (which is what we test) and the alternative hypothesis (which is what we expect). These should be stated mathematically as they were presented above and in words, explaining in normal English what each one means in terms of the research question.

Step 2: Find the Critical Values

Step 3: calculate the test statistic and effect size.

Once we have our hypotheses and the standards we use to test them, we can collect data and calculate our test statistic—in this case z . This step is where the vast majority of differences in future chapters will arise: different tests used for different data are calculated in different ways, but the way we use and interpret them remains the same. As part of this step, we will also calculate effect size to better quantify the magnitude of the difference between our groups. Although effect size is not considered part of hypothesis testing, reporting it as part of the results is approved convention.

Step 4: Make the Decision

Finally, once we have our obtained test statistic, we can compare it to our critical value and decide whether we should reject or fail to reject the null hypothesis. When we do this, we must interpret the decision in relation to our research question, stating what we concluded, what we based our conclusion on, and the specific statistics we obtained.

Example A Movie Popcorn

Our manager is looking for a difference in the mean weight of popcorn bags compared to the population mean of 8. We will need both a null and an alternative hypothesis written both mathematically and in words. We’ll always start with the null hypothesis:

hypothesis testing rejection region

In this case, we don’t know if the bags will be too full or not full enough, so we do a two-tailed alternative hypothesis that there is a difference.

Our critical values are based on two things: the directionality of the test and the level of significance. We decided in Step 1 that a two-tailed test is the appropriate directionality. We were given no information about the level of significance, so we assume that a = .05 is what we will use. As stated earlier in the chapter, the critical values for a two-tailed z  test at a = .05 are z * = ±1.96. This will be the criteria we use to test our hypothesis. We can now draw out our distribution, as shown in Figure 7.4 , so we can visualize the rejection region and make sure it makes sense.

Figure 7.4. Rejection region for z * = ±1.96. (“ Rejection Region z+-1.96 ” by Judy Schmitt is licensed under CC BY-NC-SA 4.0 .)

hypothesis testing rejection region

Now we come to our formal calculations. Let’s say that the manager collects data and finds that the average weight of this employee’s popcorn bags is M = 7.75 cups. We can now plug this value, along with the values presented in the original problem, into our equation for z :

hypothesis testing rejection region

So our test statistic is z = −2.50, which we can draw onto our rejection region distribution as shown in Figure 7.5 .

Figure 7.5. Test statistic location. (“ Test Statistic Location z-2.50 ” by Judy Schmitt is licensed under CC BY-NC-SA 4.0 .)

hypothesis testing rejection region

Effect Size

When we reject the null hypothesis, we are stating that the difference we found was statistically significant, but we have mentioned several times that this tells us nothing about practical significance. To get an idea of the actual size of what we found, we can compute a new statistic called an effect size. Effect size gives us an idea of how large, important, or meaningful a statistically significant effect is. For mean differences like we calculated here, our effect size is Cohen’s d :

hypothesis testing rejection region

This is very similar to our formula for z , but we no longer take into account the sample size (since overly large samples can make it too easy to reject the null). Cohen’s d is interpreted in units of standard deviations, just like z . For our example:

hypothesis testing rejection region

Cohen’s d is interpreted as small, moderate, or large. Specifically, d = 0.20 is small, d = 0.50 is moderate, and d = 0.80 is large. Obviously, values can fall in between these guidelines, so we should use our best judgment and the context of the problem to make our final interpretation of size. Our effect size happens to be exactly equal to one of these, so we say that there is a moderate effect.

Effect sizes are incredibly useful and provide important information and clarification that overcomes some of the weakness of hypothesis testing. Any time you perform a hypothesis test, whether statistically significant or not, you should always calculate and report effect size.

Looking at Figure 7.5 , we can see that our obtained z  statistic falls in the rejection region. We can also directly compare it to our critical value: in terms of absolute value, −2.50 > −1.96, so we reject the null hypothesis. We can now write our conclusion:

Reject H 0 . Based on the sample of 25 bags, we can conclude that the average popcorn bag from this employee is smaller ( M = 7.75 cups) than the average weight of popcorn bags at this movie theater, and the effect size was moderate, z = −2.50, p < .05, d = 0.50.

Example B Office Temperature

Let’s do another example to solidify our understanding. Let’s say that the office building you work in is supposed to be kept at 74 degrees Fahrenheit during the summer months but is allowed to vary by 1 degree in either direction. You suspect that, as a cost saving measure, the temperature was secretly set higher. You set up a formal way to test your hypothesis.

You start by laying out the null hypothesis:

hypothesis testing rejection region

Next you state the alternative hypothesis. You have reason to suspect a specific direction of change, so you make a one-tailed test:

hypothesis testing rejection region

You know that the most common level of significance is a  = .05, so you keep that the same and know that the critical value for a one-tailed z  test is z * = 1.645. To keep track of the directionality of the test and rejection region, you draw out your distribution as shown in Figure 7.6 .

Figure 7.6. Rejection region. (“ Rejection Region z1.645 ” by Judy Schmitt is licensed under CC BY-NC-SA 4.0 .)

hypothesis testing rejection region

Now that you have everything set up, you spend one week collecting temperature data:

Day

Temp

Monday

77

Tuesday

76

Wednesday

74

Thursday

78

Friday

78

hypothesis testing rejection region

This value falls so far into the tail that it cannot even be plotted on the distribution ( Figure 7.7 )! Because the result is significant, you also calculate an effect size:

hypothesis testing rejection region

The effect size you calculate is definitely large, meaning someone has some explaining to do!

Figure 7.7. Obtained z statistic. (“ Obtained z5.77 ” by Judy Schmitt is licensed under CC BY-NC-SA 4.0 .)

hypothesis testing rejection region

You compare your obtained z  statistic, z = 5.77, to the critical value, z * = 1.645, and find that z > z *. Therefore you reject the null hypothesis, concluding:

Reject H 0 . Based on 5 observations, the average temperature ( M = 76.6 degrees) is statistically significantly higher than it is supposed to be, and the effect size was large, z = 5.77, p < .05, d = 2.60.

Example C Different Significance Level

Finally, let’s take a look at an example phrased in generic terms, rather than in the context of a specific research question, to see the individual pieces one more time. This time, however, we will use a stricter significance level, a = .01, to test the hypothesis.

We will use 60 as an arbitrary null hypothesis value:

hypothesis testing rejection region

We will assume a two-tailed test:

hypothesis testing rejection region

We have seen the critical values for z  tests at a = .05 levels of significance several times. To find the values for a = .01, we will go to the Standard Normal Distribution Table and find the z  score cutting off .005 (.01 divided by 2 for a two-tailed test) of the area in the tail, which is z * = ±2.575. Notice that this cutoff is much higher than it was for a = .05. This is because we need much less of the area in the tail, so we need to go very far out to find the cutoff. As a result, this will require a much larger effect or much larger sample size in order to reject the null hypothesis.

We can now calculate our test statistic. We will use s = 10 as our known population standard deviation and the following data to calculate our sample mean:

hypothesis testing rejection region

The average of these scores is M = 60.40. From this we calculate our z  statistic as:

hypothesis testing rejection region

The Cohen’s d effect size calculation is:

hypothesis testing rejection region

Our obtained z  statistic, z = 0.13, is very small. It is much less than our critical value of 2.575. Thus, this time, we fail to reject the null hypothesis. Our conclusion would look something like:

Fail to reject H 0 . Based on the sample of 10 scores, we cannot conclude that there is an effect causing the mean ( M  = 60.40) to be statistically significantly different from 60.00, z = 0.13, p > .01, d = 0.04, and the effect size supports this interpretation.

Other Considerations in Hypothesis Testing

There are several other considerations we need to keep in mind when performing hypothesis testing.

Errors in Hypothesis Testing

In the Physicians’ Reactions case study, the probability value associated with the significance test is .0057. Therefore, the null hypothesis was rejected, and it was concluded that physicians intend to spend less time with obese patients. Despite the low probability value, it is possible that the null hypothesis of no true difference between obese and average-weight patients is true and that the large difference between sample means occurred by chance. If this is the case, then the conclusion that physicians intend to spend less time with obese patients is in error. This type of error is called a Type I error. More generally, a Type I error occurs when a significance test results in the rejection of a true null hypothesis.

The second type of error that can be made in significance testing is failing to reject a false null hypothesis. This kind of error is called a Type II error . Unlike a Type I error, a Type II error is not really an error. When a statistical test is not significant, it means that the data do not provide strong evidence that the null hypothesis is false. Lack of significance does not support the conclusion that the null hypothesis is true. Therefore, a researcher should not make the mistake of incorrectly concluding that the null hypothesis is true when a statistical test was not significant. Instead, the researcher should consider the test inconclusive. Contrast this with a Type I error in which the researcher erroneously concludes that the null hypothesis is false when, in fact, it is true.

A Type II error can only occur if the null hypothesis is false. If the null hypothesis is false, then the probability of a Type II error is called b (“beta”). The probability of correctly rejecting a false null hypothesis equals 1 − b and is called statistical power . Power is simply our ability to correctly detect an effect that exists. It is influenced by the size of the effect (larger effects are easier to detect), the significance level we set (making it easier to reject the null makes it easier to detect an effect, but increases the likelihood of a Type I error), and the sample size used (larger samples make it easier to reject the null).

Misconceptions in Hypothesis Testing

Misconceptions about significance testing are common. This section lists three important ones.

  • Misconception: The probability value ( p value) is the probability that the null hypothesis is false. Proper interpretation: The probability value ( p value) is the probability of a result as extreme or more extreme given that the null hypothesis is true. It is the probability of the data given the null hypothesis. It is not the probability that the null hypothesis is false.
  • Misconception: A low probability value indicates a large effect. Proper interpretation: A low probability value indicates that the sample outcome (or an outcome more extreme) would be very unlikely if the null hypothesis were true. A low probability value can occur with small effect sizes, particularly if the sample size is large.
  • Misconception: A non-significant outcome means that the null hypothesis is probably true. Proper interpretation: A non-significant outcome means that the data do not conclusively demonstrate that the null hypothesis is false.
  • In your own words, explain what the null hypothesis is.
  • What are Type I and Type II errors?
  • Why do we phrase null and alternative hypotheses with population parameters and not sample means?
  • Why do we state our hypotheses and decision criteria before we collect our data?
  • Why do you calculate an effect size?
  • z = 1.99, two-tailed test at a = .05
  • z = 0.34, z * = 1.645
  • p = .03, a = .05
  • p = .015, a = .01

Answers to Odd-Numbered Exercises

Your answer should include mention of the baseline assumption of no difference between the sample and the population.

Alpha is the significance level. It is the criterion we use when deciding to reject or fail to reject the null hypothesis, corresponding to a given proportion of the area under the normal distribution and a probability of finding extreme scores assuming the null hypothesis is true.

We always calculate an effect size to see if our research is practically meaningful or important. NHST (null hypothesis significance testing) is influenced by sample size but effect size is not; therefore, they provide complimentary information.

hypothesis testing rejection region

“ Null Hypothesis ” by Randall Munroe/xkcd.com is licensed under CC BY-NC 2.5 .)

hypothesis testing rejection region

Introduction to Statistics in the Psychological Sciences Copyright © 2021 by Linda R. Cote Ph.D.; Rupa G. Gordon Ph.D.; Chrislyn E. Randell Ph.D.; Judy Schmitt; and Helena Marvin is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

  • Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Critical Value: Definition, Finding & Calculator

By Jim Frost 2 Comments

What is a Critical Value?

A critical value defines regions in the sampling distribution of a test statistic. These values play a role in both hypothesis tests and confidence intervals. In hypothesis tests, critical values determine whether the results are statistically significant. For confidence intervals, they help calculate the upper and lower limits.

In both cases, critical values account for uncertainty in sample data you’re using to make inferences about a population . They answer the following questions:

  • How different does the sample estimate need to be from the null hypothesis to be statistically significant?
  • What is the margin of error (confidence interval) around the sample estimate of the population parameter ?

In this post, I’ll show you how to find critical values, use them to determine statistical significance, and use them to construct confidence intervals. I also include a critical value calculator at the end of this article so you can apply what you learn.

Because most people start learning with the z-test and its test statistic, the z-score, I’ll use them for the examples throughout this post. However, I provide links with detailed information for other types of tests and sampling distributions.

Related posts : Sampling Distributions and Test Statistics

Using a Critical Value to Determine Statistical Significance

Diagram showing critical region in a distribution.

In this context, the sampling distribution of a test statistic defines the probability for ranges of values. The significance level (α) specifies the probability that corresponds with the critical value within the distribution. Let’s work through an example for a z-test.

The z-test uses the z test statistic. For this test, the z-distribution finds probabilities for ranges of z-scores under the assumption that the null hypothesis is true. For a z-test, the null z-score is zero, which is at the central peak of the sampling distribution. This sampling distribution centers on the null hypothesis value, and the critical values mark the minimum distance from the null hypothesis required for statistical significance.

Critical values depend on your significance level and whether you’re performing a one- or two-sided hypothesis. For these examples, I’ll use a significance level of 0.05. This value defines how improbable the test statistic must be to be significant.

Related posts : Significance Levels and P-values and Z-scores

Two-Sided Tests

Two-sided hypothesis tests have two rejection regions. Consequently, you’ll need two critical values that define them. Because there are two rejection regions, we must split our significance level in half. Each rejection region has a probability of α / 2, making the total likelihood for both areas equal the significance level.

The probability plot below displays the critical values and the rejection regions for a two-sided z-test with a significance level of 0.05. When the z-score is ≤ -1.96 or ≥ 1.96, it exceeds the cutoff, and your results are statistically significant.

Graph that displays critical values for a two-sided test.

One-Sided Tests

One-tailed tests have one rejection region and, hence, only one critical value. The total α probability goes into that one side. The probability plots below display these values for right- and left-sided z-tests. These tests can detect effects in only one direction.

Graph that displays a critical value for a right-sided test.

Related post : Understanding One-Tailed and Two-Tailed Hypothesis Tests and Effects in Statistics

Using a Critical Value to Construct Confidence Intervals

Confidence intervals use the same critical values (CVs) as the corresponding hypothesis test. The confidence level equals 1 – the significance level. Consequently, the CVs for a significance level of 0.05 produce a confidence level of 1 – 0.05 = 0.95 or 95%.

For example, to calculate the 95% confidence interval for our two-tailed z-test with a significance level of 0.05, use the CVs of -1.96 and 1.96 that we found above.

To calculate the upper and lower limits of the interval, take the positive critical value and multiply it by the standard error of the mean. Then take the sample mean and add and subtract that product from it.

  • Lower Limit = Sample Mean – (CV * Standard Error of the Mean)
  • Upper Limit = Sample Mean + (CV * Standard Error of the Mean)

To learn more about confidence intervals and how to construct them, read my posts about Confidence Intervals and How Confidence Intervals Work .

Related post : Standard Error of the Mean

How to Find a Critical Value

Unfortunately, the formulas for finding critical values are very complex. Typically, you don’t calculate them by hand. For the examples in this article, I’ve used statistical software to find them. However, you can also use statistical tables.

To learn how to use these critical value tables, read my articles that contain the tables and information about using them. The process for finding them is similar for the various tests. Using these tables requires knowing the correct test statistic, the significance level, the number of tails, and, in most cases, the degrees of freedom.

The following articles provide the statistical tables, explain how to use them, and visually illustrate the results.

  • T distribution table
  • Chi-square table

Related post : Degrees of Freedom

Critical Value Calculator

Another method for finding CVs is to use a critical value calculator, such as the one below. These calculators are handy for finding the answer, but they don’t provide the context for the results.

This calculator finds critical values for the sampling distributions of common test statistics.

For example, choose the following in the calculator:

  • Z (standard normal)
  • Significance level = 0.05

The calculator will display the same ±1.96 values we found earlier in this article.

Share this:

hypothesis testing rejection region

Reader Interactions

' src=

January 16, 2024 at 5:26 pm

Hello, I am currently taking statistics and am reviewing confidence intervals. I would like to know what is the equation for calculating a two-tailed test for upper and lower limits? I would like to know is there a way to calculate one and two-tailed tests without using a confidence interval calculator and can you explain further?

' src=

January 16, 2024 at 6:43 pm

If you’re talking about calculating the critical values values for a test statistic for two-tailed test, the calculations are fairly complex. Consequently, you’ll either use statistical software, an online calculator, or a statistical table to find those limits.

Comments and Questions Cancel reply

hypothesis testing rejection region

Hypothesis Testing for Means & Proportions

  •   1  
  • |   2  
  • |   3  
  • |   4  
  • |   5  
  • |   6  
  • |   7  
  • |   8  
  • |   9  
  • |   10  

On This Page sidebar

Hypothesis Testing: Upper-, Lower, and Two Tailed Tests

Type i and type ii errors.

Learn More sidebar

All Modules

More Resources sidebar

Z score Table

t score Table

The procedure for hypothesis testing is based on the ideas described above. Specifically, we set up competing hypotheses, select a random sample from the population of interest and compute summary statistics. We then determine whether the sample data supports the null or alternative hypotheses. The procedure can be broken down into the following five steps.  

  • Step 1. Set up hypotheses and select the level of significance α.

H 0 : Null hypothesis (no change, no difference);  

H 1 : Research hypothesis (investigator's belief); α =0.05

 

Upper-tailed, Lower-tailed, Two-tailed Tests

The research or alternative hypothesis can take one of three forms. An investigator might believe that the parameter has increased, decreased or changed. For example, an investigator might hypothesize:  

: μ > μ , where μ is the comparator or null value (e.g., μ =191 in our example about weight in men in 2006) and an increase is hypothesized - this type of test is called an ; : μ < μ , where a decrease is hypothesized and this is called a ; or : μ ≠ μ where a difference is hypothesized and this is called a .  

The exact form of the research hypothesis depends on the investigator's belief about the parameter of interest and whether it has possibly increased, decreased or is different from the null value. The research hypothesis is set up by the investigator before any data are collected.

 

  • Step 2. Select the appropriate test statistic.  

The test statistic is a single number that summarizes the sample information.   An example of a test statistic is the Z statistic computed as follows:

When the sample size is small, we will use t statistics (just as we did when constructing confidence intervals for small samples). As we present each scenario, alternative test statistics are provided along with conditions for their appropriate use.

  • Step 3.  Set up decision rule.  

The decision rule is a statement that tells under what circumstances to reject the null hypothesis. The decision rule is based on specific values of the test statistic (e.g., reject H 0 if Z > 1.645). The decision rule for a specific test depends on 3 factors: the research or alternative hypothesis, the test statistic and the level of significance. Each is discussed below.

  • The decision rule depends on whether an upper-tailed, lower-tailed, or two-tailed test is proposed. In an upper-tailed test the decision rule has investigators reject H 0 if the test statistic is larger than the critical value. In a lower-tailed test the decision rule has investigators reject H 0 if the test statistic is smaller than the critical value.  In a two-tailed test the decision rule has investigators reject H 0 if the test statistic is extreme, either larger than an upper critical value or smaller than a lower critical value.
  • The exact form of the test statistic is also important in determining the decision rule. If the test statistic follows the standard normal distribution (Z), then the decision rule will be based on the standard normal distribution. If the test statistic follows the t distribution, then the decision rule will be based on the t distribution. The appropriate critical value will be selected from the t distribution again depending on the specific alternative hypothesis and the level of significance.  
  • The third factor is the level of significance. The level of significance which is selected in Step 1 (e.g., α =0.05) dictates the critical value.   For example, in an upper tailed Z test, if α =0.05 then the critical value is Z=1.645.  

The following figures illustrate the rejection regions defined by the decision rule for upper-, lower- and two-tailed Z tests with α=0.05. Notice that the rejection regions are in the upper, lower and both tails of the curves, respectively. The decision rules are written below each figure.

Rejection Region for Upper-Tailed Z Test (H : μ > μ ) with α=0.05

The decision rule is: Reject H if Z 1.645.

 

 

α

Z

0.10

1.282

0.05

1.645

0.025

1.960

0.010

2.326

0.005

2.576

0.001

3.090

0.0001

3.719

Standard normal distribution with lower tail at -1.645 and alpha=0.05

Rejection Region for Lower-Tailed Z Test (H 1 : μ < μ 0 ) with α =0.05

The decision rule is: Reject H 0 if Z < 1.645.

a

Z

0.10

-1.282

0.05

-1.645

0.025

-1.960

0.010

-2.326

0.005

-2.576

0.001

-3.090

0.0001

-3.719

Standard normal distribution with two tails

Rejection Region for Two-Tailed Z Test (H 1 : μ ≠ μ 0 ) with α =0.05

The decision rule is: Reject H 0 if Z < -1.960 or if Z > 1.960.

0.20

1.282

0.10

1.645

0.05

1.960

0.010

2.576

0.001

3.291

0.0001

3.819

The complete table of critical values of Z for upper, lower and two-tailed tests can be found in the table of Z values to the right in "Other Resources."

Critical values of t for upper, lower and two-tailed tests can be found in the table of t values in "Other Resources."

  • Step 4. Compute the test statistic.  

Here we compute the test statistic by substituting the observed sample data into the test statistic identified in Step 2.

  • Step 5. Conclusion.  

The final conclusion is made by comparing the test statistic (which is a summary of the information observed in the sample) to the decision rule. The final conclusion will be either to reject the null hypothesis (because the sample data are very unlikely if the null hypothesis is true) or not to reject the null hypothesis (because the sample data are not very unlikely).  

If the null hypothesis is rejected, then an exact significance level is computed to describe the likelihood of observing the sample data assuming that the null hypothesis is true. The exact level of significance is called the p-value and it will be less than the chosen level of significance if we reject H 0 .

Statistical computing packages provide exact p-values as part of their standard output for hypothesis tests. In fact, when using a statistical computing package, the steps outlined about can be abbreviated. The hypotheses (step 1) should always be set up in advance of any analysis and the significance criterion should also be determined (e.g., α =0.05). Statistical computing packages will produce the test statistic (usually reporting the test statistic as t) and a p-value. The investigator can then determine statistical significance using the following: If p < α then reject H 0 .  

 

 

  • Step 1. Set up hypotheses and determine level of significance

H 0 : μ = 191 H 1 : μ > 191                 α =0.05

The research hypothesis is that weights have increased, and therefore an upper tailed test is used.

  • Step 2. Select the appropriate test statistic.

Because the sample size is large (n > 30) the appropriate test statistic is

  • Step 3. Set up decision rule.  

In this example, we are performing an upper tailed test (H 1 : μ> 191), with a Z test statistic and selected α =0.05.   Reject H 0 if Z > 1.645.

We now substitute the sample data into the formula for the test statistic identified in Step 2.  

We reject H 0 because 2.38 > 1.645. We have statistically significant evidence at a =0.05, to show that the mean weight in men in 2006 is more than 191 pounds. Because we rejected the null hypothesis, we now approximate the p-value which is the likelihood of observing the sample data if the null hypothesis is true. An alternative definition of the p-value is the smallest level of significance where we can still reject H 0 . In this example, we observed Z=2.38 and for α=0.05, the critical value was 1.645. Because 2.38 exceeded 1.645 we rejected H 0 . In our conclusion we reported a statistically significant increase in mean weight at a 5% level of significance. Using the table of critical values for upper tailed tests, we can approximate the p-value. If we select α=0.025, the critical value is 1.96, and we still reject H 0 because 2.38 > 1.960. If we select α=0.010 the critical value is 2.326, and we still reject H 0 because 2.38 > 2.326. However, if we select α=0.005, the critical value is 2.576, and we cannot reject H 0 because 2.38 < 2.576. Therefore, the smallest α where we still reject H 0 is 0.010. This is the p-value. A statistical computing package would produce a more precise p-value which would be in between 0.005 and 0.010. Here we are approximating the p-value and would report p < 0.010.                  

In all tests of hypothesis, there are two types of errors that can be committed. The first is called a Type I error and refers to the situation where we incorrectly reject H 0 when in fact it is true. This is also called a false positive result (as we incorrectly conclude that the research hypothesis is true when in fact it is not). When we run a test of hypothesis and decide to reject H 0 (e.g., because the test statistic exceeds the critical value in an upper tailed test) then either we make a correct decision because the research hypothesis is true or we commit a Type I error. The different conclusions are summarized in the table below. Note that we will never know whether the null hypothesis is really true or false (i.e., we will never know which row of the following table reflects reality).

Table - Conclusions in Test of Hypothesis

 

is True

Correct Decision

Type I Error

is False

Type II Error

Correct Decision

In the first step of the hypothesis test, we select a level of significance, α, and α= P(Type I error). Because we purposely select a small value for α, we control the probability of committing a Type I error. For example, if we select α=0.05, and our test tells us to reject H 0 , then there is a 5% probability that we commit a Type I error. Most investigators are very comfortable with this and are confident when rejecting H 0 that the research hypothesis is true (as it is the more likely scenario when we reject H 0 ).

When we run a test of hypothesis and decide not to reject H 0 (e.g., because the test statistic is below the critical value in an upper tailed test) then either we make a correct decision because the null hypothesis is true or we commit a Type II error. Beta (β) represents the probability of a Type II error and is defined as follows: β=P(Type II error) = P(Do not Reject H 0 | H 0 is false). Unfortunately, we cannot choose β to be small (e.g., 0.05) to control the probability of committing a Type II error because β depends on several factors including the sample size, α, and the research hypothesis. When we do not reject H 0 , it may be very likely that we are committing a Type II error (i.e., failing to reject H 0 when in fact it is false). Therefore, when tests are run and the null hypothesis is not rejected we often make a weak concluding statement allowing for the possibility that we might be committing a Type II error. If we do not reject H 0 , we conclude that we do not have significant evidence to show that H 1 is true. We do not conclude that H 0 is true.

Lightbulb icon signifying an important idea

 The most common reason for a Type II error is a small sample size.

return to top | previous page | next page

Content ©2017. All Rights Reserved. Date last modified: November 6, 2017. Wayne W. LaMorte, MD, PhD, MPH

Critical Region and Confidence Interval

Contents Toggle Main Menu 1 Confidence Interval 2 Significance Levels 3 Critical Region 4 Critical Values 5 Constructing a Confidence Interval 5.1 Binomial Distribution 5.2 Normal Distribution 5.3 Student $t$-distribution 6 Video Examples

Confidence Interval

A confidence interval , also known as the acceptance region, is a set of values for the test statistic for which the null hypothesis is accepted. i.e. if the observed test statistic is in the confidence interval then we accept the null hypothesis and reject the alternative hypothesis .

Significance Levels

Confidence intervals can be calculated at different significance levels . We use $\alpha$ to denote the level of significance and perform a hypothesis test with a $100(1- \alpha)$% confidence interval.

Confidence intervals are usually calculated at $5$% or $1$% significance levels, for which $\alpha = 0.05$ and $\alpha = 0.01$ respectively. Note that a $95$% confidence interval does not mean there is a $95$% chance that the true value being estimated is in the calculated interval. Rather, given a population, there is a $95$% chance that choosing a random sample from this population results in a confidence interval which contains the true value being estimated.

Critical Region

A critical region , also known as the rejection region, is a set of values for the test statistic for which the null hypothesis is rejected. i.e. if the observed test statistic is in the critical region then we reject the null hypothesis and accept the alternative hypothesis.

Critical Values

The critical value at a certain significance level can be thought of as a cut-off point. If a test statistic on one side of the critical value results in accepting the null hypothesis, a test statistic on the other side will result in rejecting the null hypothesis.

Constructing a Confidence Interval

Binomial distribution.

Usually, the easiest way to perform a hypothesis test with the binomial distribution is to use the $p$-value and see whether it is larger or smaller than $\alpha$, the significance level used.

Sometimes, if we have observed a large number of Bernoulli Trials, we can use the observed probability of success $\hat{p}$, based entirely on the data obtained, to approximate the distribution of error using the normal distribution. We do this using the formula \[\hat{p} \pm z_{1-\frac{\alpha}{2} } \sqrt{ \frac{1}{n} \hat{p} (1-\hat {p})}\] where $\hat{p}$ is the estimated probability of success, $z_{1- \frac{\alpha}{2} }$ is obtained from the normal distribution tables , $\alpha$ is the significance level and $n$ is the sample size.

Worked Example

A coin is tossed $1050$ times and lands on heads $500$ times. Construct a $90$% confidence interval for the probability $p$ of getting a head.

Here the observed probability of success $\hat{p} = \dfrac{500}{1050}$, $n=1050$ and $\alpha = 0.1$ so $z_{1-\frac{\alpha}{2} } = z_{0.95} = 1.645$. This is because $\Phi^{-1} (0.95) = 1.645$ .

So the confidence interval will be between $\hat{p} + z_{1-\frac{\alpha}{2} } \sqrt{ \frac{1}{n} \hat{p} (1-\hat {p})} \text{ and } \hat{p} - z_{1-\frac{\alpha}{2} } \sqrt{ \frac{1}{n} \hat{p} (1-\hat {p})} . $ By substituting into these expressions, we find that the confidence interval is between \begin{align} &\dfrac{500}{1050} + 1.645 \sqrt{ \frac{1}{1050} \times \dfrac{500}{1050} \times \left(1- \dfrac{500}{1050}\right) }\\ \text{ and } &\dfrac{500}{1050} - 1.645 \sqrt{ \frac{1}{1050} \times \dfrac{500}{1050} \times \left(1- \dfrac{500}{1050}\right) }\\\\ &=0.47619 + (1.645 \times \sqrt{0.00024} ) \text{ and } 0.47619 - (1.645 \times \sqrt{0.00024} ) \\ &=0.50155 \text{ and } 0.45084 . \end{align} So the confidence interval is $(0.45084, 0.50155)$.

Normal Distribution

We can use either the $z$-score or the sample mean $\bar{x}$ as the test statistic. If the $z$-score is used then reading straight from the tables gives the critical values.

For example, the critical values for a $5$% significance test are:

To obtain a confidence interval for the mean, use the following procedure:

For a two-tailed test with a $5$% significance level we need to consider \begin{align} 0.95 &= \mathrm{P}[-k< Z < k] \\ &= \mathrm{P}\left[-k<\dfrac{\bar{X}-\mu}{\frac{\sigma}{\sqrt{n} } } \mu+1.96\frac{\sigma}{\sqrt{n} }. \]

Student $t$-distribution

Given the number of degrees of freedom $v$ and the significance level $\alpha$, the critical values can be obtained from the tables. Critical regions can then be computed from these.

If we are performing a hypothesis test at a $1$% significance level with $15$ degrees of freedom using the Student $t$-distribution then there are three cases, depending on the alternative hypothesis.

If we are performing a two-tailed test, the critical values are $\pm2.9467$ so the confidence interval is $-2.9467 \leq t \leq 2.9467$ where $t$ is the test statistic. The critical regions will be $t< -2.9467$ and $t>2.9467$.

If we are performing a one-tailed test, the critical value is $2.6025$:

Video Examples

In this video, Daniel Organisciak calculates a one-tailed confidence interval for the normal distribution.

In this video Daniel Organisciak calculates a two-tailed confidence interval for the normal distribution.

Confidence intervals and hypothesis testing

  • Understand the t value and Pr(>|t|) fields in the output of lm
  • Be able to think critically about the meaning and limitations of strict hypothesis tests

Confidence intervals and hypothesis tests

T-statistics.

Suppose we’re interested in the value \(\beta_k\) , the \(k\) –th entry of \(\betav\) in for some regression \(\y_n \sim \betav^\trans \xv_n\) . Recall that we have been finding \(\v\) such that

\[ \sqrt{N} (\beta_k - \beta) \rightarrow \gauss{0, \v}. \]

For example, under homoskedastic assumptions with \(\y_n = \xv_n^\trans \beta + \res_n\) , we have

\[ \begin{aligned} \v =& \sigma^2 (\Xcov^{-1})_{kk} \textrm{ where } \\ \Xcov =& \lim_{N \rightarrow \infty} \frac{1}{N} \X^\trans \X \textrm{ and } \\ \sigma^2 =& \var{\res_n}. \end{aligned} \]

Typically we don’t know \(\v\) , but have \(\hat\v\) such that \(\hat\v \rightarrow \v\) as \(N \rightarrow \infty\) . Again, under homoeskedastic assumptions,

\[ \begin{aligned} \hat\v =& \hat\sigma^2 \left(\frac{1}{N} \X^\trans \X \right)_{kk} \textrm{ where } \\ \hat\sigma^2 =& \frac{1}{N-P} \sumn \reshat_n^2. \end{aligned} \]

Putting all this together, the quantity

\[ \t = \frac{\sqrt{N} (\betahat_k - \beta_k)}{\sqrt{\hat\v}} = \frac{\betahat_k - \beta_k}{\sqrt{\hat\v / N}} \]

has an approximately standard normal distribution for large \(N\) .

Quantities of this form are called “T–statistics,” since, under our normal assumptions, we have shown that

\[ \t \sim \studentt{N-P}, \]

exactly for all \(N\) . Despite it’s name, it’s worth remembering that a T–statistic is actually not Student T distributed in general; it is asymptotically normal. Recall that for large \(N\) , the Student T and standard normal distributions coincide.

Plugging in values for \(\beta_k\)

However, there’s something funny about a “T-statistic” — as written, you cannot compute it, because you don’t know \(\beta_k\) . In fact, finding what values \(\beta_k\) might plausibly take is the whole point of statistical inference.

So what good is a T–statistic? Informally, one way to reason about it is as follows. Let’s take some concrete values for an example. Suppose guess that \(\beta_k^0\) is the value, and compute

\[ \betahat_k = 2 \quad\textrm{and}\quad \sqrt{\hat\v / N} = 3 \quad\textrm{so}\quad \t = \frac{2 - \beta_k^0}{3}. \]

We use the superscript \(0\) to indicate that \(\beta_k^0\) is our guess, not necessarily the true value.

Suppose we plug in some particular value, such as \(\beta_k^0 = 32\) . Using this value, we compute our T–statistic, and find that it’s very large — in our example, we would have \(\t = (2 - 32) / 3 = -30\) . It’s very unlikely to get a standard normal (or Student T) draw this large. Therefore, either:

  • We got a very (very very very very) unusual draw of our standard normal or
  • We guessed wrong, i.e.  \(\beta_k \ne \beta_k^0 = 32\) .

In this way, we might consider it plausible to “reject” the hypothesis that \(\beta_k = 32\) .

There’s a subtle problem with the preceding reasoning, however. Suppose we do the same calculation with \(\beta_k^0 = 1\) . Then \(\t = (2 - 1) / 3 = 1/3\) . This is a much more typical value for a standard normal distribution. However, the probability of getting exactly \(1/3\) — or, indeed, any particular value — is zero, since the normal distribution is continuous valued. (This problem is easiest to see with continuous random variables, but the same basic problem will occur when the distribution is discrete but spread over a large number of possible values.)

Rejection regions

To resolve this problem, we can specify regions that we consider implausible. That is, suppose we take a region \(R\) such that, if \(\t\) is standard normal (or Student-T), then

\[ \prob{\t \in R} \le \alpha \quad\textrm{form some small }\alpha. \]

For example, we might take \(\Phi^{-1}(\cdot)\) to be the inverse CDF of \(\t\) if \(\beta_k = \beta_k^0\) . Then we can take

\[ R_{ts} = \{\t: \abs{t} \ge q \} \quad\textrm{where } q = \Phi^{-1}(\alpha / 2)\\ \]

where \(q\) is an \(\alpha / 2\) quantile of the distribution of \(\t\) . But there are other choices, such as

\[ \begin{aligned} R_{u} ={}& \{\t: \t \ge q \} \quad\textrm{where } q = \Phi^{-1}(1 - \alpha) \\ R_{l} ={}& \{\t: \t \le q \} \quad\textrm{where } q = \Phi^{-1}(\alpha) \\ R_{m} ={}& \{\t: \abs{\t} \le q \} \quad\textrm{where } q = \Phi^{-1}(0.5 + \alpha / 2) \quad\textrm{(!!!)}\\ R_{\infty} ={}& \begin{cases} \emptyset & \textrm{ with independent probability } \alpha \\ (-\infty,\infty) & \textrm{ with independent probability } 1 - \alpha \\ \end{cases} \quad\textrm{(!!!)} \end{aligned} \]

The last two may seem silly, but they are still rejection regions into which \(\t\) is unlikely to fall if it has a standard normal distribution.

How can we think about \(\alpha\) , and about the choice of the region? Recall that

  • If \(\t \in R\) , we “reject” the proposed value of \(\beta_k^0\)
  • If \(\t \notin R\) , we “fail to reject” the given value of \(\beta_k^0\) .

Of course, we don’t “accept” the value of \(\beta_k^0\) in the sense of believing that \(\beta_k^0 = \beta_k\) — if nothing else, there will always be multiple values of \(\beta_k^0\) that we do not reject, and \(\beta_k\) cannot be equal to all of them.

So there are two ways to make an error:

  • Type I error: We are correct and \(\beta_k = \beta_k^0\) , but \(\t \in R\) and we reject
  • Type II error: We are incorrect and \(\beta_k \ne \beta_k^0\) , but \(\t \notin R\) and we fail to reject

By definition of the region \(R\) , we have that

\[ \prob{\textrm{Type I error}} \le \alpha. \]

This is true for all the regions above, including the silly ones!

What about the Type II error? It must depend on the “true” value of \(\beta_k\) , and on the shape of the rejection region we choose. Note that

\[ \t = \frac{\betahat_k - \beta_k^0}{\sqrt{\hat\v / N}} = \frac{\betahat_k - \beta_k}{\sqrt{\hat\v / N}} + \frac{\beta_k - \beta_k^0}{\sqrt{\hat\v / N}} \]

So if the true value \(\beta_k \gg \beta_k^0\) , then our \(\t\) statistic is too large, and so on.

For example:

  • Then \(\t\) is too large and positive.
  • \(R_u\) and \(R_{ts}\) will reject, but \(R_l\) will not.
  • The Type II error of \(R_u\) will be lowest, then \(R_{ts}\) , then \(R_l\) .
  • \(R_l\) actually has greater Type II error than the silly regions, \(R_\infty\) and \(R_m\) .
  • Then \(\t\) is too large and negative.
  • \(R_l\) and \(R_{ts}\) will reject, but \(R_u\) will not.
  • The Type II error of \(R_l\) will be lowest, then \(R_{ts}\) , then \(R_u\) .
  • \(R_u\) actually has greater Type II error than the silly regions, \(R_\infty\) and \(R_m\) .
  • Then \(\t\) has about the same distribution as when \(\beta_k^0 = \beta_k\) .
  • All the regions reject just about as often as we commit a Type I error, that is, a proportion \(\alpha\) of the time.

Thus the shape of the region determines which alternatives you are able to reject. The probability of “rejecting” under a particular alternative is called the “power” of a test; the power is one minus the Type II error rate.

The null and alternative

Statistics has some formal language to distinguish between the “guess” \(\beta_k^0\) and other values.

  • Falsely rejecting the null hypothesis is called a Type I error
  • By construction, Type I errors occurs with probability at most \(\alpha\)
  • Falsely failling to reject the null hypothesis is called a Type II error
  • Type II errors’ probability depends on the alternative(s) and the rejection region shape.

The choice of a test statistic (here, \(\t\) ), together with a rejection region (here, \(R\) ) constitute a “test” of the null hypothesis. In general, one can imagine constructing many different tests, with different theoretical guarantees and power.

Confidence intervals

Often in applied statistics, a big deal is made about a single hypothesis test, particularly the null that \(\beta_k^0 = 0\) . Often this is not a good idea. Typically, we do not care whether \(\beta_k\) is precisely zero; rather, we care about the set of plausible values \(\beta_k\) might take. The distinction can be expressed as the difference between statistical and practical significance:

  • Statistical significance is the size of an effect relative to sampling variability
  • Practical significance is the size of the effect in terms of its effect on reality.

For example, suppose that \(\beta_k\) is nonzero but very small, but \(\sqrt{\hat\v / N}\) is very small, too. We might reject the null hypothesis \(\beta_k^0 = 0\) with a high degree of certainty, and call our result statistically significant . However, a small value of \(\beta_k\) may still not be a meaningful effect size for the problem at hand, i.e., it may not be practically significant .

A remendy is confidence intervals, which are actually closely related to our hypothesis tests. Recall that we have been constructing intervals of the form

\[ \prob{\beta_k \in I} \ge 1-\alpha \]

\[ I = \left(\betahat_k \pm q \hat\v / \sqrt{N}\right), \]

where \(q = \Phi^{-1}(\alpha / 2)\) , and \(\Phi\) is the CDF of either the standard normal or Student T distribution. It turns out that \(I\) is precisely the set of values that we would not reject with region \(R_{ts}\) . And, indeed, given a confidence interval, a valid test of the hypothesis \(\beta_k^0\) is given by rejecting if an only if \(\beta_k^0 \in I\) .

This duality is entirely general:

  • The set of values that a valid test does not reject is a valid confidence interval
  • Checking whether a value falls in a valid confidence interval is a valid test

Source Code

Data Science Tutorials

For Data Science Learners

Top 10 Data Visualisation Tools

  • Select variables of data frame in R R
  • Sort or Order Rank in R R
  • Locate position of patterns in a character string in R R

Comparing group means in R

Rejection Region in Hypothesis Testing

Rejection Region in Hypothesis Testing, What is the definition of a Rejection Region?

A rejection region is a section of a graph where the null hypothesis is rejected (assuming your test results fall into that area).

Rejection Region

The primary goal of statistics is to test theories or experiment results.

For example, you may have developed a novel fertilizer that you believe accelerates plant growth by 50%.

To demonstrate that your hypothesis is correct, your experiment must:

Be consistent.

Be likened to a well-known plant fact (in this example, probably the average growth rate of plants without the fertilizer).

This sort of statistical testing is known as a hypothesis test.

Descriptive statistics vs Inferential statistics: Guide

The testing process includes a rejection region (also known as a crucial region).

It is a branch of probability that determines if your theory (or “hypothesis”) is likely to be correct.

Probability Distributions and Rejection Regions rejection region

A two-tailed t-rejection distribution’s zones.

hypothesis testing rejection region

A probability distribution can be used to draw every rejection region. A two-tailed t-distribution is seen in the figure above.

A rejection region can also be seen in just one tail.

Two-Tailed vs One-Tailed

Your null hypothesis statement determines the type of test to use. If your question is, “Is the average growth rate larger than 8cm per day?”.

Because you’re only interested in one way, this is a one-tailed test (greater than 8cm a day).

You might alternatively have a single “less than” rejection region.

“Is the growth rate less than 8cm per day?” for example. When you want to see if there’s a difference in both directions, you’ll utilize a two-tailed test with two regions (greater than and less than).

Alpha Levels and Rejection Regions

As a researcher, you decide what amount of alpha you’re willing to take.

For example, if you wanted to be 95% confident that your results are significant, you would set a 5% alpha level (100% – 95%).

That 5% threshold is the rejection threshold. In a one-tailed test, the 5% would be in one of the tails.

The rejection zone for a two-tailed test would be in two tails.

hypothesis testing rejection region

The rejection zone is in one tail of a one-tailed test.

P-Values and Rejection Regions

A hypothesis can be tested in two ways: with a p-value or with a critical value.

p-value method: When you perform a hypothesis test (for example, a z test), the result is a p-value.

A “probability value” is the p-value. It’s what determines if your hypothesis is likely to be correct or not.

If the value falls within the rejection range, the results are statistically significant, and the null hypothesis can be rejected.

If your p-value is beyond the rejection range, your results are insufficient to reject the null hypothesis.

What is statistical significance?

A statistically significant outcome in the case of plant fertilizer would be one that shows the fertilizer does actually make plants grow quicker (compared to other fertilizers).

The stages are the same for the Rejection Region approach with a critical value. Instead of calculating a p-value, a crucial value is calculated.

If the value is within the region, the null hypothesis is rejected.

Related Posts

similarity measure between two populations

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

 Yes, add me to your mailing list

hypothesis testing rejection region

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

6a.4.3 - steps in conducting a hypothesis test for \(p\), six steps for one-sample proportion hypothesis test, steps 1-3 section  .

Let's apply the general steps for hypothesis testing to the specific case of testing a one-sample proportion.

\( np_0\ge 5 \) and \(n(1−p_0)≥5 \)

One Proportion Z-test Hypotheses

One Proportion Z-test: \(z^*=\dfrac{\hat{p}-p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}} \)

Rejection Region Approach

Steps 4-6 section  , left-tailed test, right-tailed test, two-tailed test.

Reject \(H_0\) if \(z^* \le z_\alpha\)

Reject \(H_0\) if \(z^* \ge z_{1-\alpha}\)

Reject \(H_0\) if \(|z^*| \ge |z_{\alpha/2}|\)

View the critical values and regions with an \(\alpha=.05\).

These graphs show the various z-critical values for tests at an \(\alpha=.05\). *The graphs are not to scale.

Reject \(H_0\) if \(z^* \le -1.65\)

Reject \(H_0\) if \(z^* \ge 1.65\)

Reject \(H_0\) if \(|z^*| \ge |-1.96|\)

P-Value Approach

Example 6-5: penn state students from pennsylvania section  .

Flag of Pennsylvania

Referring back to example 6-4. Say we take a random sample of 500 Penn State students and find that 278 are from Pennsylvania. Can we conclude that the proportion is larger than 0.5 at a 5% level of significance?

Conduct the test using both the rejection region and p-value approach.

  • Steps 4-6: Rejection Region
  • Steps 4-6: P-Value

Set up the hypotheses. Since the research hypothesis is to check whether the proportion is greater than 0.5 we set it up as a one (right)-tailed test:

\( H_0\colon p=0.5 \) vs \(H_a\colon p>0.5 \)

Can we use the z-test statistic? The answer is yes since the hypothesized value \(p_0 \) is \(0.5\) and we can check that: \(np_0=500(0.5)=250 \ge 5 \) and \(n(1-p_0)=500(1-0.5)=250 \ge 5 \)

According to the question, \(\alpha= 0.05 \).

\begin{align} z^*&= \dfrac{0.556-0.5}{\sqrt{\frac{0.5(1-0.5)}{500}}}\\z^*&=2.504 \end{align}

We can use the standard normal table to find the value of \(Z_{0.05} \). From the table, \(Z_{0.05} \) is found to be \(1.645\) and thus the critical value is \(1.645\). The rejection region for the right-tailed test is given by:

\( z^*>1.645 \)

The test statistic or the observed Z-value is \(2.504\). Since \(z^*\) falls within the rejection region, we reject \(H_0 \).

With a test statistic of \(2.504\) and critical value of \(1.645\) at a 5% level of significance, we have enough statistical evidence to reject the null hypothesis. We conclude that a majority of the students are from Pennsylvania.

Since \(\text{p-value} = 0.0062 \le 0.05\) (the \(\alpha \) value), we reject the null hypothesis.

With a test statistic of \(2.504\) and p-value of \(0.0062\), we reject the null hypothesis at a 5% level of significance. We conclude that a majority of the students are from Pennsylvania.

Online Purchases Section  

An e-commerce research company claims that 60% or more graduate students have bought merchandise online. A consumer group is suspicious of the claim and thinks that the proportion is lower than 60%. A random sample of 80 graduate students shows that only 22 students have ever done so. Is there enough evidence to show that the true proportion is lower than 60%?

Conduct the test at 10% Type I error rate and use the p-value and rejection region approaches.

Set up the hypotheses. Since the research hypothesis is to check whether the proportion is less than 0.6 we set it up as a one (left)-tailed test:

\( H_0\colon p=0.6 \) vs \(H_a\colon p<0.6 \)

Can we use the z-test statistic? The answer is yes since the hypothesized value \(p_0 \) is 0.6 and we can check that: \(np_0=80(0.6)=48 \ge 5 \) and \(n(1-p_0)=80(1-0.6)=32 \ge 5 \)

According to the question, \(\alpha= 0.1 \).

\begin{align} z^* &=\frac{\hat{p}-p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}\\&=\frac{.275-0.6}{\sqrt{\frac{0.6(1-0.6)}{80}}}\\&=-5.93 \end{align}

The critical value is the value of the standard normal where 10% fall below it. Using the standard normal table, we can see that the value is -1.28.

The rejection region is any \(z^* \) such that \(z^*<-1.28 \) . Since our test statistic, -5.93, is inside the rejection region, we reject the null hypothesis.

There is enough evidence in the data provided to suggest, at 10% level of significance, that the true proportion of students who made purchases online was less than 60%.

Since our p-value is very small and less than our significance level of 10%, we reject the null hypothesis.

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Confused about rejection region and P-value

I am confused about the rejection region and P-value. I thought that the P-value is the simply the probability associated to the set of points where we would reject the null hypothesis (the rejection region). But according to this response, they are not related. Is it then possible, having $R$ as a test statstic, to have a rejection region, say $C=\{R^{observed}:R^{observed}\geq w_{1−\frac{\alpha}{2}}\}\cup\{R^{observed}:R^{observed}\leq w_{\frac{\alpha}{2}}\}$ , but the p-value is $P(R>R^{observed})$ ?

  • hypothesis-testing

Toney Shields's user avatar

2 Answers 2

I think this will be best understood with an example. Let us solve a hypothesis test for the mean height of people in a country. We have the information about the heights of a sample of people in that country. First, we define our null and alternative hypthesis:

  • $H_0: \mu \geq a$
  • $H_1: \mu < a$

And (let me change your notation) we have our test statistic $Z$ . Now, we know two things about this test statistic:

We know the formula for this statistic: $Z=\dfrac{\bar{x}-\mu}{\sigma/\sqrt{n}}$

We know the distribution it follows. For simplicity in this explanation, let me assume it follows a normal distribution. We would have that $Z\sim N(0,1)$ .

Now the important part: We do not know the true population value of $\mu$ (this means, we do not know the true mean height of all the people in the country, if we wanted to know that we would need to know the height of all the citizens in that country). But we can say: hey, let's assume that the value for $\mu$ is the value stated in $H_0$ (a.k.a. let's assume that $\mu=a$ )

And we pose the key question: How likely is it for $\mu$ to take the value $a$ given the information that we know from the sample data?

Now that we are assuming that $\mu=a$ , we can obtain the value of our test statistic under the null hypothesis (this is, assuming that $\mu=a$ ).

There can be two possible results:

  • If $a$ is not a likely value for $\mu$ to have then the statistic $\hat{Z}$ value will not fit well in the distribution $Z$ follows and we will reject $H_0$

enter image description here

  • If $a$ is a likely value for $\mu$ to have, then the statistic $\hat{Z}$ value will fit well in the distribution $Z$ follows and we will fail to reject $H_0$ :

enter image description here

And finally here comes into play the rejection region and the p-value.

  • We will consider that the tail of the distribution (in this case, the left tail, as stated by $H_1$ are all not likely values for $\hat{Z}$ so if any value is close to the tail, we reject $H_0$ . How close to the tail? That is stated by the significance level $\alpha$ . The rejection region is:

$$RR=\{Z {\ }s.t.{\ } Z < -Z_{\alpha}\}$$

If we take, for example, $\alpha=0.05$ then the rejection region is $$RR=\{Z {\ }s.t.{\ } Z < -Z_{0.05}\}= \{Z {\ }s.t.{\ } Z < -1.645\}$$

  • And the p-value is simply the probability of obtaining a value at least as extreme as the one from our sample, or in other words, if the sample value of our statistic is $\hat{Z}$ then the p-value is $$p-value=P(Z<\hat{Z})$$

In one image, in red the rejection region, and in green the p-value.

enter image description here

Remark : This plots have been made assuming that we are doing a left sided test . Considering a right sided or two sided test would yield similar but not equal images.

Richard Hardy's user avatar

  • $\begingroup$ Thank you for your explanation! So If we wanted to test $H_0: \mu =a$ vs $H_1:\mu \neq a$, then $RR=\{Z s.t. Z_{1-\frac{\alpha}{2}} <Z < Z_{\frac{\alpha}{2}}\}$, and the P-value would be the probability $P(|Z|<\hat{Z})$. Right? $\endgroup$ –  Toney Shields Commented Feb 9, 2021 at 10:35
  • 1 $\begingroup$ Yeah, that is right. $\endgroup$ –  Álvaro Méndez Civieta Commented Feb 9, 2021 at 10:52
  • $\begingroup$ Suppose then we want to test some other parameter $H_0: \theta = a$ vs $H_1: \theta \neq a$, and that our test statistic $T$ under the null hypothesis has a Gamma distribution $\Gamma(m,n)$. The rejection region would still the same except that the quantile changes to that of a $\Gamma(m,n)$. What I'm having trouble with is : what is the P-value in this case? $\endgroup$ –  Toney Shields Commented Feb 9, 2021 at 10:58
  • 1 $\begingroup$ Well in two sided tests as you said you can always obtain the rejection region, but I am afraid that the two sided p-value is only well defined when the test statistic has a symetric distribution. $\endgroup$ –  Álvaro Méndez Civieta Commented Feb 9, 2021 at 11:08

The rejection region is fixed beforehand. If the null hypothesis is true then some $\alpha \%$ of the observations will be in the region.

The p-value is not the same as this $\alpha \%$ .

The p-value is computed for each separate observation, and can be different for two observations that both fall inside the rejection region.

The p-value indicates how extreme* a value is. And expresses this in terms of a probability. This expression in terms of a probability could be seen as the quantile of the outcome when the potential outcomes are ranked in decreasing order of extremity. The more extreme the observation, the lower the quantile.

In short: The rejection region can be seen as the region of observations for which the associated quantile or p-value is lower than some value.

See also: https://stats.stackexchange.com/questions/tagged/critical-value

* What is and what is not considered extreme is not well defined here and might be considered arbitrary, but depending on the situation there might be good reasons to choose a particular definition. For example, think about one-sided and two-sided tests in which case different sorts of extremities are chosen.

Because of the variations in choice for 'extremeness', it might be that you encounter a situation where some observation is inside the rejection region but has a p-value that is larger. This is the case when the two use a different definition. But typically the p-value and rejection region should relate to the same definition of 'extremeness'.

Sextus Empiricus's user avatar

  • $\begingroup$ Does it mean that the rejection region can be $C=\{R^{observed}:R^{observed}\geq w_{1−\frac{\alpha}{2}}\}\cup\{R^{observed}:R^{observed}\leq w_{\frac{\alpha}{2}}\}$, but a P-value $P(R>R^{observed})$ (meaning that "extreme" is when $R>R^{observed}$)? $\endgroup$ –  Toney Shields Commented Feb 9, 2021 at 11:02
  • 1 $\begingroup$ Ah now I see your problem. $P(R>R^{observed})$ can be very high. Say you have the hypothesis $R \sim N(0,\sigma^2)$ (ie normal distributed). The 5% rejection region could be when for the absolute value $|R|>2\sigma$. In that case, if you have an observation below $-2$, that is $R^{observed}<-2$, then the observation is inside the rejection region, but the probability to observe $R>R^{observed}$ is very high... $\endgroup$ –  Sextus Empiricus Commented Feb 9, 2021 at 11:15
  • 1 $\begingroup$ ... the discrepancy occurs because the probability $P(R>R^{observed})$ is not using the same definition for an extreme value as the definition that has been used for the rejection region. $\endgroup$ –  Sextus Empiricus Commented Feb 9, 2021 at 11:16
  • 1 $\begingroup$ @ToneyShields, this has to do with the original Fisherian understanding of $p$-value (how extreme the result is as determined by $H_0$ alone, i.e. the sampling distribution of the test statistic under $H_0$) versus the modern Fisher-Neyman-Pearson hybrid ((how extreme the result is as determined by $H_0$ and $H_1$ together). I have a few related threads here . In that (and other) regard(s), the footnote of Sextus' answer is an important one. $\endgroup$ –  Richard Hardy Commented Feb 9, 2021 at 12:41

Your Answer

Sign up or log in, post as a guest.

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .

Not the answer you're looking for? Browse other questions tagged hypothesis-testing p-value or ask your own question .

  • Featured on Meta
  • Preventing unauthorized automated access to the network
  • User activation: Learnings and opportunities
  • Join Stack Overflow’s CEO and me for the first Stack IRL Community Event in...

Hot Network Questions

  • Does this work for page turns in a busy violin part?
  • Why would elves care what happens to Middle Earth?
  • removing dead citrix apps from applications menu?
  • Macro not working in newcommand, but plain text input works
  • Is it possible to know where the Sun is just by looking at the Moon?
  • Do early termination fees hold up in court?
  • What are major reasons why Republicans support the death penalty?
  • Are logic and mathematics the only fields in which certainty (proof) can be obtained?
  • How to fix bottom of stainless steel pot that has been separated from its main body?
  • Undamaged tire repeatedly deflating
  • CH in non-set theoretic foundations
  • Is it possible to speed up this function?
  • Are there individual protons and neutrons in a nucleus?
  • How do you tell someone to offer something to everyone in a room by taking it physically to everyone in the room so everyone can have it?
  • How can I make second argument path relative to the first on a command?
  • Tikz: On straight lines moving balls on a circle inside a regular polygon
  • What does IpOutDiscard in RFC1213 / nstat actually mean?
  • Is there any language which distinguishes between “un” as in “not” and “un” as in “inverse”?
  • Does legislation on transgender healthcare affect medical researchers?
  • If two subgroups intersect in only the identity, do their cosets intersect in at most one element?
  • Is “No Time To Die” the first Bond film to feature children?
  • FIFO capture using cat not working as intended?
  • Neil Tyson: gravity is the same every where on the geoid
  • How to sub-align expressions and preserve equation numbering?

hypothesis testing rejection region

IMAGES

  1. PPT

    hypothesis testing rejection region

  2. Rejection Region in Hypothesis Testing » Data Science Tutorials

    hypothesis testing rejection region

  3. Hypothesis Testing and Confidence Intervals

    hypothesis testing rejection region

  4. Hypothesis Testing

    hypothesis testing rejection region

  5. PPT

    hypothesis testing rejection region

  6. PPT

    hypothesis testing rejection region

VIDEO

  1. General procedure for testing hypothesis ch 16 lec 5

  2. Hypothesis Testing for Mean: p-value is more than the level of significance (Hat Size Example)

  3. Section 7

  4. Video Lecture 37

  5. Hypothesis Testing

  6. Hypothesis Testing

COMMENTS

  1. Hypothesis Testing: Significance Level & Rejection Region

    In this situation, the rejection region is on the right side. So, if the test statistic is bigger than the cut-off z-score, we would reject the null, otherwise, we wouldn't. Importance of the Significance Level and the Rejection Region. To sum up, the significance level and the reject region are quite crucial in the process of hypothesis ...

  2. Rejection Region Definition Statistics How To

    A one tailed test with the rejection region in one tail. Rejection Regions and P-Values. There are two ways you can test a hypothesis: with a p-value and with a critical value. P-value method: When you run a hypothesis test (for example, a z test), the result of that test will be a p value. The p value is a "probability value."

  3. S.3.1 Hypothesis Testing (Critical Value Approach)

    The critical value for conducting the left-tailed test H0 : μ = 3 versus HA : μ < 3 is the t -value, denoted -t(α, n - 1), such that the probability to the left of it is α. It can be shown using either statistical software or a t -table that the critical value -t0.05,14 is -1.7613. That is, we would reject the null hypothesis H0 : μ = 3 in ...

  4. 6a.2

    Make a decision about the null hypothesis: In this step, we decide to either reject the null hypothesis or decide to fail to reject the null hypothesis. Notice we do not make a decision where we will accept the null hypothesis. State an overall conclusion: Once we have found the p-value or rejection region, and made a statistical decision about ...

  5. 6a.4.1

    The rejection region is the region where, if our test statistic falls, then we have enough evidence to reject the null hypothesis. If we consider the right-tailed test, for example, the rejection region is any value greater than \(c_{1-\alpha} \), where \(c_{1-\alpha}\) is the critical value.

  6. 7.5: Critical values, p-values, and significance level

    Figure \(\PageIndex{1}\): The rejection region for a one-tailed test. The shaded rejection region takes us 5% of the area under the curve. Any result which falls in that region is sufficient evidence to reject the null hypothesis. The rejection region is bounded by a specific \(z\)-value, as is any area under the curve.

  7. Chapter 7: Introduction to Hypothesis Testing

    The rejection region is bounded by a specific z value, as is any area under the curve. In hypothesis testing, the value corresponding to a specific rejection region is called the critical value, z crit (" z crit"), or z * (hence the other name "critical region"). Finding the critical value works exactly the same as finding the z score corresponding to any area under the curve as we did ...

  8. 8.1: The Elements of Hypothesis Testing

    Every instance of hypothesis testing discussed in this and the following two chapters will have a rejection region like one of the six forms tabulated in the tables above. No matter what the context a test of hypotheses can always be performed by applying the following systematic procedure, which will be illustrated in the examples in the ...

  9. Data analysis: hypothesis testing: 5.1 Acceptance and rejection regions

    In the context of the marketing team's hypothesis testing, the reject region for the one-tailed test with an alpha level of 1% corresponds to the range of z-scores that fall within the top 1% of the normal distribution. ... It also circles the rejection regions of null hypothesis when z = 2.33 and alpha = 0.01.

  10. Statistical hypothesis test

    A statistical hypothesis test is a method of statistical inference used to decide whether the data sufficiently supports a ... These define a rejection region for each hypothesis. 2 Report the exact level of significance (e.g. p = 0.051 or p = 0.049). Do not refer to "accepting" or "rejecting" hypotheses. If the result is "not significant ...

  11. Hypothesis Testing

    If the null hypothesis is false, then the F statistic will be large. The rejection region for the F test is always in the upper (right-hand) tail of the distribution as shown below. Rejection Region for F Test with a =0.05, df 1 =3 and df 2 =36 (k=4, N=40) For the scenario depicted here, the decision rule is: Reject H 0 if F > 2.87. The ANOVA ...

  12. Critical Value: Definition, Finding & Calculator

    Two-sided hypothesis tests have two rejection regions. Consequently, you'll need two critical values that define them. Because there are two rejection regions, we must split our significance level in half. Each rejection region has a probability of α / 2, making the total likelihood for both areas equal the significance level.

  13. 1.6

    This region, which leads to rejection of the null hypothesis, is called the rejection region. For example, for a significance level of 5%: For an upper-tail test, the critical value is the 95th percentile of the t-distribution with n−1 degrees of freedom; reject the null in favor of the alternative if the t-statistic is greater than this.

  14. Hypothesis Testing: Upper-, Lower, and Two Tailed Tests

    The procedure for hypothesis testing is based on the ideas described above. Specifically, we set up competing hypotheses, select a random sample from the population of interest and compute summary statistics. We then determine whether the sample data supports the null or alternative hypotheses. ... Rejection Region for Two-Tailed Z Test (H 1 ...

  15. Critical Region and Confidence Interval

    A critical region, also known as the rejection region, is a set of values for the test statistic for which the null hypothesis is rejected. i.e. if the observed test statistic is in the critical region then we reject the null hypothesis and accept the alternative hypothesis. Critical Values

  16. 12.7: The Summary of Hypothesis Testing for Two Parameters

    In this approach, we construct the rejection region under the probability density curve of the involved distribution based on the type of the test and the significance level alpha. The boundaries of the region are called the critical values and can be found in a similar way for all procedures. ... Due to the logic of a hypothesis testing ...

  17. Confidence intervals and hypothesis testing

    Type II errors' probability depends on the alternative(s) and the rejection region shape. The choice of a test statistic (here, \(\t\)), together with a rejection region (here, \(R\)) constitute a "test" of the null hypothesis. In general, one can imagine constructing many different tests, with different theoretical guarantees and power.

  18. PDF Lecture 7: Hypothesis Testing and ANOVA

    The intent of hypothesis testing is formally examine two opposing conjectures (hypotheses), H0 and HA. These two hypotheses are mutually exclusive and exhaustive so that one is true to the exclusion of the other. We accumulate evidence - collect and analyze sample information - for the purpose of determining which of the two hypotheses is true ...

  19. hypothesis testing

    The significance level is the probability of getting a result in the rejection region, given the null hypothesis is true. Note that the alternative puts an ordering on your test statistic - the values of the test statistic most in keeping with the alternative are the ones you want in your rejection region.. The p-value is the probability of a test statistic at least as extreme (under that ...

  20. 6a.4.2

    Using the rejection region approach, you need to check the table or software for the critical value every time you use a different α value. In addition to just using it to reject or not reject H 0 by comparing p-value to α value, the p-value also gives us some idea of the strength of the evidence against H 0.

  21. Rejection Region in Hypothesis Testing » Data Science Tutorials

    The testing process includes a rejection region (also known as a crucial region). It is a branch of probability that determines if your theory (or "hypothesis") is likely to be correct. Probability Distributions and Rejection Regions rejection region. A two-tailed t-rejection distribution's zones. A probability distribution can be used to ...

  22. hypothesis testing

    You could even choose a region such as $[\mu-d,\mu + d]$ ! It is however not natural, because one would like your rejection set to include weird values of the statistic (values that are far from $\mu$) rather than normal values. Why? Because the test statistic is expected to be a measure of the distance between the data and the null hypothesis ...

  23. 6a.4.3

    Write down clearly the rejection region for the problem. The critical value is the value of the standard normal where 10% fall below it. Using the standard normal table, we can see that the value is -1.28. Step 5: Make a decision about the null hypothesis. The rejection region is any \(z^* \) such that \(z^*<-1.28 \) .

  24. hypothesis testing

    The rejection region is fixed beforehand. If the null hypothesis is true then some α% of the observations will be in the region. The p-value is not the same as this α%. The p-value is computed for each separate observation, and can be different for two observations that both fall inside the rejection region. The p-value indicates how extreme ...

  25. 8.1: The null and alternative hypotheses

    The Null hypothesis \(\left(H_{O}\right)\) is a statement about the comparisons, e.g., between a sample statistic and the population, or between two treatment groups. The former is referred to as a one-tailed test whereas the latter is called a two-tailed test. The null hypothesis is typically "no statistical difference" between the ...