Binomial distributions have four mutually exclusive and exhaustive categories.

In several examples in this chapter, an outcome has included only two possibilities. That is, an individual had or had not had childhood measles, a coin landed with head or tail up, or a tested specimen did or did not have cancer cells. This dichotomous outcome is quite common in experimental work. For example, questionnaires quite often have questions requiring simple yes or no responses, medical tests have positive or negative results, banks either succeed or fail after the first 5 years, and so forth. In each of these cases, there are two outcomes for which we will arbitrarily adopt the generic labels “success” and “failure.” The measles example is such an experiment where each individual in a couple is a “trial,” and each trial produces a dichotomous outcome (yes or no).

The binomial probability distribution describes the distribution of the random variable Y, the number of successes in ntrials, if the experiment satisfies the following conditions:

1.

The experiment consists of nidentical trials.

2.

Each trial results in one of two mutually exclusive outcomes, one labeled a “success,” the other a “failure.”

3.

The probability of a success on a single trial is equal to p. The value of premains constant throughout the experiment.

4.

The trials are independent.

The formula or function for computing the probabilities for the binomial probability distribution is given by

p(y)=n!y!(n−y)!py(1−p)n−y,fory=0,1,…,n.

The notation n!, called the factorial of n, is the quantity obtained by multiplying nby every nonzero integer less than n. For example 7!=7⋅6⋅5⋅4⋅3⋅2⋅1=5040. By definition, 0!=1.

Derivation of the Binomial Probability Distribution Function

The binomial distribution is one that can be derived with the use of the simple probability rules presented in this chapter. Although memorization of this derivation is not needed, being able to follow it provides an insight into the use of probability rules. The formula for the binomial probability distribution can be developed by first observing that p(y)is the probability of getting exactly ysuccesses out of ntrials. We know that there are ntrials so there must be (n−y)failures occurring at the same time. Because the trials are independent, the probability of ysuccesses is the product of the probabilities of the yindividual successes, which is pyand the probability of (n−y)failures is (1−p)n−y. Then the probability of ysuccesses and (n−y)failures is py(1−p)n−y.

However, this is the probability of only one of the many sequences of ysuccesses and (n−y)failures and the definition of p(y)is the probability of any sequence of ysuccesses and (n−y)failures. We can count the number of such sequences using a counting rule called combinations. This rule says that there are

ny=n!y!(n−y)!

ways that we can get yitems from nitems. Thus, if we have 5 trials there are

5!2!(5−2)!=5⋅4⋅3⋅2⋅1(2⋅1)(3⋅2⋅1)=10

ways of arranging 2 successes and 3 failures. (The reader may want to list these and verify that there are 10 of them.)

The probability of ysuccesses, then, is obtained by repeated application of the addition rule. That is, the probability of ysuccesses is obtained by multiplying the probability of a sequence by the number of possible sequences, resulting in the above formula.

Note that the measles example satisfies the conditions for a binomial experiment. That is, we label “having had childhood measles” a success, the number of trials is two (a couple is an experiment, and an individual a trial), and p=0.2, using the value from the national health study. We also assume that each individual has the same chance of having had measles as a child, hence pis constant for all trials, and we have previously assumed that the incidence of measles is independent between the individuals. The random variable Yis the number in each couple who have had measles. Using the binomial distribution function, we obtain

P(Y=0)=2!0!(2−0)!(0.2)0(0.8)2−0=0.64,P(Y=1)=2!1!(2−1)!(0.2)1(0.8)2−1=0.32,P(Y=2)=2!2!(2−2)!(0.2)2(0.8)2−2=0.04.

These probabilities agree exactly with those that were obtained earlier from basic principles, as they should.

For small to moderate sample sizes, many scientific calculators and spreadsheet programs have the binomial probability distribution as a function. For larger samples, there is an approximation that is useful both in practice and in deriving methods of statistical inference. The use of this approximation is presented in Section 2.5 and additional applications are presented in subsequent chapters.

The binomial distribution has only one parameter, p(nis usually considered a fixed value). The mean and variance of the binomial distribution are expressed in terms of pas

μ=np,σ2=np(1−p).

For our health study example, n=2and p=0.2gives

μ=2(0.2)=0.4,σ2=(2)(0.2)(0.8)=0.32.

Again these results are identical to the values previously computed for this example.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128230435000023

Binomial Probability

Chris Tsokos, Rebecca Wooten, in The Joy of Finite Mathematics, 2016

Basic Problems

Using a binomial probability distribution to find probabilities:

6.4.1.

Given X~Binomialn=10,p=0.2, find P(2).

6.4.2.

Given X~Binomialn=100,p=0.2, find P(20).

6.4.3.

Given X~Binomialn=25,p=0.4, find P(20).

At Most, At Least & Exactly!

6.4.4.

State the assumptions that underlie the binomial probability distribution and give an example of a physical situation that satisfy these assumptions.

6.4.5.

Fair coins: A fair coin is tossed 10 times. What is the probability we will observe:

a.

Exactly 6 heads?

b.

At most 6 heads?

c.

At least 6 heads?

6.4.6.

Bulls eye: A man fires at a target six times; the probability of him hitting the bull’s eye is 0.40 on each trial.

a.

What is the probability that the man will hit the target at least once?

b.

What is the probability that he will not hit the target at all?

6.4.7.

Batting average: A baseball player’s batting average is 0.310. If in a given game he bats four times, what is the probability that he will get

a.

No hits?

b.

At most two hits?

c.

At least two hits?

6.4.8.

Standard deck of cards: A card is drawn and replaced four times from a standard deck of 52 cards. What is the probability that

a.

Four aces were drawn?

b.

Four diamonds were drawn?

c.

Four picture cards were drawn?

d.

Why is it necessary that the card was replaced?

6.4.9.

Birth: Assuming that newborns are equally likely to be boys or girls, what is the probability that a family of six children will have at least two boys?

6.4.10.

Space travel: Assuming that it is known that 99.8% of the launchings of satellites into orbit are successful. What is the probability that in the next five launchings there will be

a.

No mishaps?

b.

Exactly one mishap?

c.

At least one mishap?

6.4.11.

Passing statistics: The probability that a history student will pass a statistics course is 0.80. What is the probability that out of 10 history majors enrolled in such a course

a.

At least five will pass?

b.

None will fail?

6.4.12.

Christmas-treeing: When a student does not know the answers to a multiple choice test, they often randomly complete the test create strings of dark circles for each answer in the shape of Christmas tree lights. A student Christmas-trees a 10 question exam where each question had five options of which exactly one is correct.

a.

What is the probability that the student correctly answered exactly 7 questions?

b.

What is the probability that the student passed with 7 or more correct answers?

c.

What is the probability that the student answers at most 6 correctly?

6.4.13.

English alphabet: The most frequently used letter in the English alphabet is E. The letter Z is the least frequently used letter. The letter E occurs 12.7% of the time whereas Z occurs only 0.07% of the time.

a.

What is the probability of at least two Zs occurring in a page containing 2500 characters?

b.

What is the probability of exactly two Es occurring in a sentence containing 25 characters?

c.

What is the probability of at least two Es occurring in a sentence containing 25 characters?

d.

What is the probability of exactly one Z occurring in a page containing 2500 characters?

e.

What is the probability of exactly two Zs occurring in a page containing 2500 characters?

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128029671000061

Normal Probability

Chris Tsokos, Rebecca Wooten, in The Joy of Finite Mathematics, 2016

7.2 Normal Probability Distributions

The normal or Gaussian Probability Distribution is most popular and important because of its unique mathematical properties which facilitate its application to practically any physical problem in the real world; if not for the data’s distribution directly, then in terms of the sampling distribution, this will be the discussion in Section 7.3. It constitutes the basis for the development of many of the statistical methods that we will learn in the following chapters. The study of the mathematical properties of the normal probability distribution is beyond the scope of this book; however, we shall concentrate on its usefulness in characterizing the behavior of continuous random variables that frequently occur in daily experience.

The normal probability distribution was discovered by Abraham De Moivre in 1733 as a way of approximating the binomial probability distribution when the number of trials in a given experiment is very large. In 1774, Laplace studied the mathematical properties of the normal probability distribution. Through a historical error, the discovery of the normal distribution was attributed to Gauss who first referred to it in a paper in 1809. In the nineteenth century, many scientists noted that measurement errors in a given experiment followed a pattern (the normal curve of errors) that was closely approximated by this probability distribution. The normal probability distribution is formally defined as follows:

Definition 7.2.1

Normal Probability Distribution

A continuous random variable X is normally distributed or follows a normal probability distribution if its probability distribution is given by the following function:

fx=1σ2πe−x−μ22σ2,

−∞<x<∞,−∞<μ<∞,0<σ2<∞.

The universally accepted notation X~Nμσ2is read as “the continuous random variable X is normally distributed with a population mean μ and population variance σ2. Of course in real world problems we do not know the true population parameters, but we estimate them from the sample mean and sample variance. However, first, we must fully understand the normal probability distribution.

The graph of the normal probability distribution is a “bell-shaped” curve, as shown in Figure 7.3. The constants μ and σ2 are the parameters; namely, “μ” is the population true mean (or expected value) of the subject phenomenon characterized by the continuous random variable, X, and “σ2” is the population true variance characterized by the continuous random variable, X. Hence, “σ” the population standard deviation characterized by the continuous random variable X; and the points located at μ−σand μ+σare the points of inflection; that is, where the graph changes from cupping up to cupping down.

Binomial distributions have four mutually exclusive and exhaustive categories.

Figure 7.3. Normal probability with points of inflections μ − σ and μ + σ.

The area under the bell-shaped curve is so disposed that it represents probability; that is, the total area under the curve is equal to one. The random variable X can assume values anywhere from minus infinity to plus infinity, but in practice we very seldom encounter problems in which random variables have such a wide range. The normal curve graph of the normal probability distribution) is symmetric with respect to the mean μ as the central position. That is, the area between μ and κ units to the left of μ is equal to the area between μ and κ units to the right of μ. This fact is illustrated in Figure 7.4.

Binomial distributions have four mutually exclusive and exhaustive categories.

Figure 7.4. Normal probability from the center, μ to μ + κ; that is, k above center.

Pμ≤X≤μ+κ

There is not a unique normal probability distribution, since the mathematical formula of the graph depends on the two variables, the mean μ and the variance σ2. Figure 7.5 is a graphical representation of the normal distribution for a fixed value of σ2 with μ varying.

Binomial distributions have four mutually exclusive and exhaustive categories.

Figure 7.5. Normal probability distribution for fixed σ and varying μ.

You recall that the variance or standard deviation is a measure of the spread or “dispersion” of the random variable X around its expected value or central tendency, μ. Thus, σ2 of the normal distribution determines the shape of the bell-shaped curve. Figure 7.6 is a graphical representation of the normal distribution for a fixed value of μ with varying σ2. Thus, the expected value μ, locates the central tendency of the random variable, X, and the variance σ2 determines the shape of the bell-shaped curve. That is, for small values of σ2, the distribution is clustered close to the mean; as σ2 increases, the distribution deviates away from the mean. Despite the fact that the shapes are different, the total area under each curve which represents probability is equal to one.

Binomial distributions have four mutually exclusive and exhaustive categories.

Figure 7.6. Normal probability distribution for fixed μ and varying σ.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128029671000073

Additional Topics in Probability

Kandethody M. Ramachandran, Chris P. Tsokos, in Mathematical Statistics with Applications in R (Second Edition), 2015

3.7.2 Minitab Examples

Minitab contains subroutines that can do pdf and cdf computations. For example, for binomial random variables, the pdf and cdf can be respectively computed using the following comments.

MTB > pdf k;

SUBC > binomial n p.

and

MTB > cdf;

SUBC > binomial n p.

Practice: Try the following and see what you get.

MTB > pdf 3;

SUBC > binomial 5 0.40.

will give

KP(X = K)3.000.2304

And

MTB > cdf;

SUBC > binomial 5 0.40.

will give

BINOMIAL WITH N = 5 P = 0.400000

K P(X LESS OR = K)

0 0.0778

1 0.3370

2 0.6826

3 0.9130

4 0.9898

5 1.0000

Similarly, if we want to calculate the cdf for a normal probability distribution with mean k and standard deviation s, use the following comments.

MTB > cdf x;

SUBC > normal k s.

will give P(X ≤ x).

Practice: Try the following.

MTB > cdf 4.20;

SUBC > normal 4 2.

We can use the invcdf command to find the inverse cdf. For a given probability p, P(X ≤ x) = F(x) = p, we can find x for a given distribution. For example, for a normal probability distribution with mean k and standard deviation s, use the following.

MTB > invcdf p;

SUBC > normal k s.

We can also use the pull-down menus to compute the probabilities. The following example illustrates this for a binomial probability distribution.

Example 3.7.1

A manufacturer of a color printer claims that only 5% of their printers require repairs within the first year. If out of a random sample of 18 of their printers, four required repairs within the first year, does this tend to refute or support the manufacturer’s claim? Use Minitab.

Solution

Type the numbers 1 through 18 in C1. Then

Calc > Probability Distributions > Binomial. . . > choose Cumulative probability > in Number of trials, enter 18 and in Probability of success, enter 0.05 > in Input column: type C1 > Click OK

The required probability is P(X≥4) = 1 − P(X ≤ 3) = 1 − 0.9891 = 0.0109.

Distribution checking

In order to perform right statistical analysis, it is necessary to know the distribution of the data we are using. We can use Minitab to do this by following steps.

1.

Choose Stat > Quality Tools > Individual Distribution Identification.

2.

Specify the column of data to analyze and the distribution to check it against.

3.

Click OK.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124171138000035

Additional topics in probability

Kandethody M. Ramachandran, Chris P. Tsokos, in Mathematical Statistics with Applications in R (Third Edition), 2021

3.2.1 The binomial probability distribution

The simplest distribution is the one with only two possible outcomes. For example, when a coin (not necessarily fair) is tossed, the outcomes are heads or tails, with each outcome occurring with some positive probability. These two possible outcomes may be referred to as “success” if heads occurs and “failure” if tails occurs. Assume that the probability of heads appearing in a single toss is p; then the probability of tails is 1 − p = q. We define a random variable X associated with this experiment as taking value 1 with probability p if heads occurs and value 0 if tails occurs, with probability q. Such a random variable X is said to have a Bernoulli probability distribution. That is, X is a Bernoulli random variable if, for some p, 0 ≤ p ≤ 1, the probability P(X = 1) = p, and P(X = 0) = 1 − p. The probability function of a Bernoulli random variable X can be expressed as:

p(x)=P(X=x)={px(1−p)1−x,x=0,10,otherwise.

Note that this distribution is characterized by the single parameter p. It can be easily verified that the mean and variance of X are E[X] = p and Var(X) = pq, respectively, and the mgf is MX(t) = pet + (1 − p).

Even when the experimental values are not dichotomous, reclassifying the variable as a Bernoulli variable can be helpful. For example, consider blood pressure measurements. Instead of representing the numerical values of blood pressure, if we reclassify the blood pressure as “high blood pressure” and “low blood pressure,” we may be able to avoid dealing with a possible misclassification due to diurnal variation, stress, and so forth, and concentrate on the main issue, which would be, is the average blood pressure unusually high?

In a succession of Bernoulli trials, one is more interested in the total number of successes (whenever a 1 occurs in a Bernoulli trial, we term it a “success”). The probability of observing exactly k successes in n independent Bernoulli trials yields the binomial probability distribution. In practice, the binomial probability distribution is used when we are concerned with the occurrence of an event, not its magnitude. For example, in a clinical trial, we may be more interested in the number of survivors after a treatment.

Definition 3.2.1

A binomial experiment is one that has the following properties: (1) The experiment consists of n identical trials. (2) Each trial results in one of the two outcomes, called a success S and failure F. (3) The probability of success on a single trial is equal to p and remains the same from trial to trial. The probability of failure is 1 − p = q. (4) The outcomes of the trials are independent. (5) The random variable X is the number of successes in n trials.

We have seen that the number of ways of obtaining x successes in n trials is given by:

(nx)=n!x!(n−x)!.

Definition 3.2.2

A random variable X is said to have binomial probability distribution with parameters (n, p) if and only if:

P(X=x)=p(x)=(nx)pxqn−x={n!x!(n−x)!pxqn−x,x=0,1,2,…,n,0≤p≤1,andq=1−p0,otherwise.

To show the dependence on n and p, we denote p(x) by b(x; n, p) and the cumulative probability distribution by:

B(x,n,p)=∑i=0xb(i,n,p).

The binomial probabilities have been tabulated and are given in the binomial table.

By the binomial theorem, we have:

(p+q)n=∑x=0n(nx)pxqn−x.

Because (p + q) = 1, we conclude that ∑i=0xb(i,n,p)=∑x=0n(nx)pxqn−x=1n=1,for all n ≥ 1 and 0 ≤ p ≤ 1.Hence, p(x) is indeed a probability mass function (pmf). The binomial probability distribution is characterized by two parameters, the number of independent trials n and the probability of success p. Following R commands will help in binomial calculation. If we want the compute probability, say for n = 10, and p = 0.2, use “dbinom(0:10, 10, 0.2)”. Suppose we want to compute P(X=6), use “dbinom(6, 10, 0.2)”. If we want cumulative probability, say, P(X≤3), use “pbinom(3, 10, 0.2)”.

Example 3.2.1

It is known that screws produced by a certain machine will be defective with probability 0.01 independent of one another. If we randomly pick 10 screws produced by this machine, what is the probability that at least two screws will be defective?

Solution

Let X be the number of defective screws out of 10. Then X can be considered as a binomial random variable with parameters (10, 0.01). Hence, using the binomial pmf p(x), given in Definition 3.2.2, we obtain that at least two screws will be defective, as:

P(X≥2)=∑x=210(10x)(0.01)x(0.99)10−x=1−[P(X=0)+P(X=1)]=0.004.

R-command: 1-pbinom(1,10,0.01)

In Chapter 2, we introduced Mendel's law. In biology, the result “gene frequencies and genotype ratios in a randomly breeding population remain constant from generation to generation” is known as the Hardy–Weinberg law.

Example 3.2.2

Suppose we know that the frequency of a dominant gene, A, in a population is 0.2. If we randomly select eight members of this population, what is the probability that at least six of them will display the dominant phenotype? Assume that the population is sufficiently large that removing eight individuals will not affect the frequency and that the population is in Hardy–Weinberg equilibrium.

Solution

First of all, note that an individual can have the dominant gene, A, if the person has traits AA, aA, or Aa. Hence, if the gene frequency is 0.2, the probability that an individual is of genotype A is:

P(A)=P(AA∪Aa∪aA)=P(AA)+2P(Aa)=(0.2)2+2(0.2)(0.8)=0.36.

Let X denote the number of individuals out of eight that display the dominant phenotype. Then X is binomial with n = 8, and p = 0.36. Thus, the probability that at least six of them will display the dominant phenotype is:

P(X≥6)=P(X=6)+P(X=7)+P(X=8)=∑i=68(10i)(0.36)i(0.64)10−i=0.029259.

R-command: 1-pbinom(5,8,0.36)

For large n, calculations of the binomial probabilities is tedious. Many statistical software packages have binomial probability distribution commands. For the purpose of this book, we will use the binomial table that gives the cumulative probabilities B(x, n, p) for n = 2 through n = 20 and p = 0.05, 0.10, 0.15, …, 0.90, 0.95. If we need the probability of a single term, we can use the relation:

P(X=x)=b(x,n,p)=B(x,n,p)−B(x−1,n,p).

Example 3.2.3

A manufacturer of inkjet printers claims that only 5% of their printers require repairs within the first year. If, of a random sample of 18 of the printers, four required repairs within the first year, does this tend to refute or support the manufacturer's claim?

Solution

Let us assume that the manufacturer's claim is correct; that is, the probability that a printer will require repairs within the first year is 0.05. Suppose 18 printers are chosen at random. Let p be the probability that any one of the printers will require repairs within the first year. We now find the probability that at least four of the 18 will require repairs during the first year. Let X represent the number of printers that require repair within the first year. Then X follows the binomial pmf with p = 0.05, n = 18. The probability that four or more of the 18 will require repair within the first year is given by:

P(X≥4)=∑x=418(18x)(0.05)x(0.95)18−x

or, using the binomial table:

∑x=418b(x,18,0.05)=1−B(3,18,0.05)=1−0.9891=0.0109.

This value (approximately 1.1%) is very small. We have shown that if the manufacturer's claim is correct, then the chances of observing four or more bad printers out of 18 are very small. But we did observe exactly four bad ones. Therefore, we must conclude that the manufacturer's claim cannot be substantiated.

Mean, Variance, and Moment-Generating Function of a Binomial Random Variable

Theorem 3.2.1

If X is a binomial random variable with parameters n and p, then:

E(X)=μ=np

and

Var(X)=σ2=np(1−p).

Also, the mgf is:

MX(t)=[pet+(1−p)]n.

Proof. We derive the mean and the variance. The derivation for mgf is given in Example 2.6.8. Using the binomial pmf, p(x) = (n!/(x!(n − x)!))pxqn−x, and the definition of expectation, we have:

μ=E(X)=∑x=0nxp(x)=∑x=0nxn!x!(n−x)!px(1−p)n−x=∑x=1nn!(x−1)!(n−x)!px(1−p)n−x,

since the first term in the sum is zero, as x = 0.

Let i = x − 1. When x varies from 1 through n, i = (x − 1) varies from 0 through (n − 1). Hence,

μ=∑i=0n−1n!i!(n−i−1)!pi+1(1−p)n−i−1=np∑i=0n−1(n−1)!i!(n−1−i)!pi(1−p)n−1−i,=np,

because the last summand is that of a binomial pmf with parameter (n − 1), and p, hence, equals 1.

To find the variance, we first calculate E[X(X − 1)]:

E[X(X−1)]=∑x=0nx(x−1)n!x!(n−x)!px(1−p)n−x=∑x=2nn!(x−2)!(n−x)!px(1−p)n−x,

because the first two terms are 0. Let i = x − 2. Then,

E[X(X−1)]=∑i=0n−2n!i!(n−i−2)!pi+2(1−p)n−i−2=n(n−1)p2∑i=0n−2(n−2)!i!(n−2−i)!pi(1−p)n=n(n−1)p2,

because the last summand is that of a binomial pmf with parameter (n − 2) and p thus, equals 1.

Note that E(X(X − 1)) = EX2 − E(X), and so we obtain:

σ2=Var(X)=E(X2)−[E(X)]2=E[X(X−1)]+E(X)−[E(X)]2=n(n−1)p2+np−(np)2=−np2+np=np(1−p).

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128178157000038

The Usefulness of Mathematics

Chris Tsokos, Rebecca Wooten, in The Joy of Finite Mathematics, 2016

1.6 Bernoulli Trials

Binomial distributions have four mutually exclusive and exhaustive categories.

Trial is Anglo-French meaning act or process of testing. A Bernoulli trial is an experiment whose outcome is random, but has one of only two possible outcomes: success or failure. The discrete probability distribution that we use to answer such questions, among others, is the binomial or Bernoulli probability distribution; a mathematical expression that generates the actual probability for specific inputs that relate to a given question. We encounter many important situations that can be characterized by a discrete random variable with this developed distribution.

Discrete

Binomial distributions have four mutually exclusive and exhaustive categories.
A type of measure such that the outcomes are separate and distinct

Random

Binomial distributions have four mutually exclusive and exhaustive categories.
Taken such that each individual is equally likely to be selected

Variable

Binomial distributions have four mutually exclusive and exhaustive categories.
A distinct characteristic of an individual to be observed or measured

It is our goal in studying Bernoulli trials to put ourselves in a position to compute binomial probabilities and address such questions as:

Births: A baby born less than 36 weeks is consider premature. What is the probability that a baby will be born premature?

Medicine: What is the probability that a given drug will be effective to cure a specific disease?

Politics: What is the probability that Candidate A will be elected president of the US?

Gambling: What is the probability that I will obtain an odd number in a single roll of a fair die?

Computers: What is the probability that the computer you purchased online will be operable (non-defective)?

We will learn how to use this very important probability distribution to answer the above questions, among others.

How dare we speak of the laws of chance? Is not chance the antithesis of all law?

Joseph Bertrand

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128029671000012

Null Hypothesis Significance Testing

John K. Kruschke, in Doing Bayesian Data Analysis (Second Edition), 2015

11.1.2 With intention to fix N

Suppose we ask the assistant why she stopped flipping the coin. She says that her lucky number is 24, so she decided to stop when she completed 24 flips of the coin. This means the space of possible outcomes is restricted to combinations of z and N for which N is fixed at N = 24. This corresponds to a single column of the z, N space as shown in the left panel of Figure 11.2 (which shows N = 5 highlighted instead of N = 24 because of lack of space). The computational question then becomes, what is the probability of the actual proportion, or a proportion more extreme than expected, within that column of the outcome space?

What is the probability of getting a particular number of heads when N is fixed? The answer is provided by the binomial probability distribution, which states that the probability of getting z heads out of N flips is

(11.5)p (z|N,θ)=(Nz) θz (1−θ)N−z

where the notation (Nz)will be defined below. The binomial distribution is derived by the following logic. Consider any specific sequence of N flips with z heads. The probability of that specific sequence is simply the product of the individual flips, which is the product of Bernoulli probabilities Πiθγi(1 − θ)1-γi=θz(1 − θ)N−z, which we first saw in Section 6.1, p. 124. But there are many different specific sequences with z heads. Let's count how many there are. Consider allocating z heads to N flips in the sequence. The first head could go in any one of the N slots. The second head could go in any one of the remaining N − 1 slots. The third head could go in any one of the remaining N − 2 slots. And so on, until the zth head could go in any one of the remaining N − (z − 1) slots. Multiplying those possibilities together means that there are N · (N − 1) · … · (N − (z − 1)) ways of allocating z heads to N flips. As an algebraic convenience, notice that N · (N − 1) · ... · (N − (z − 1)) = N !/(N − z) !, where “!” denotes factorial. In this counting of the allocations, we've counted different orderings of the same allocation separately. For example, putting the first head in the first slot and the second head in the second slot was counted as a different allocation than putting the first head in the second slot and the second head in the first slot. There is no meaningful difference in these allocations, because they both have a head in the first and second slots. Therefore, we remove this duplicate counting by dividing out by the number of ways of permuting the z heads among their z slots. The number of permutations of z items is z!. Putting this all together, the number of ways of allocating z heads among N flips, without duplicate counting of equivalent allocations, is N!/[(N − z) !z!]. This factor is also called the number of ways of choosing z items from N possibilities, or “N choose z” for short, and is denoted (Nz). Thus, the overall probability of getting z heads in N flips is the probability of any particular sequence of z heads in N flips times the number of ways of choosing z slots from among the N possible flips. The product appears in Equation 11.5.

A graph of a binomial probability distribution is provided in the right panel of Figure 11.3, for N = 24 and θ = 0.5. Notice that the graph contains 25 spikes, because there are 25 possible proportions, from 0/24, 1 /24, 2/24, through 24/24. The binomial probability distribution in Figure 11.3 is also called a sampling distribution. This terminology stems from the idea that any set of N flips is a representative sample of the behavior of the coin. If we were to repeatedly run experiments with a fair coin, such that in every experiment we flip the coin exactly N times, then, in the long run, the probability of getting each possible z would be the distribution shown in Figure 11.3. To describe it carefully, we would call it “the probability distribution of the possible sample outcomes,” but that's usually just abbreviated as “the sampling distribution.”

Binomial distributions have four mutually exclusive and exhaustive categories.

Figure 11.3. The imaginary cloud of possible outcomes when N is fixed. The null hypothesis likelihood distribution and parameter are shown on the left. The stopping intention is shown in the middle. The sampling distribution and p value are shown on the right. Compare with Figures 11.4 and 11.5.

Terminological aside: Statistical methods that rely on sampling distributions are sometimes called frequentist methods. A particular application of frequentist methods is NHST.

Figure 11.3 is a specific case of the general structure shown in Figure 11.1. The left side of Figure 11.3 shows the null hypothesis as the probability distribution for the two states of the coin, with θ = 0.5. This corresponds to the face in the lower-left corner of Figure 11.1, who is thinking of a particular hypothesis. The middle of Figure 11.3 shows an arrow marked with the sampling intention. This arrow indicates the intended manner by which random samples will be generated from the null hypothesis. This sampling intention also corresponds to the face in the lower-left corner of Figure 11.1, who is thinking of the sampling intention. The right side of Figure 11.3 shows the resulting probability distribution of possible outcomes. This sampling distribution corresponds to the cloud of imaginary possibilities in Figure 11.1.

It is important to understand that the sampling distribution is a probability distribution over samples of data, and is not a probability distribution over parameter values. The right side of Figure 11.3 has the sample proportion, z/N, on its abscissa, and does not have the parameter value, θ, on its abscissa. Notice that the parameter value, θ, is fixed at a specific value and appears in the left panel of the figure.

Our goal, as you might recall, is to determine whether the probability of getting the observed result, z/N = 7/24, is tiny enough that we can reject the null hypothesis. By using the binomial probability formula in Equation 11.5, we determine that the probability of getting exactly z = 7 heads in N = 24 flips is 2.063%. Figure 11.3 shows this probability as the height of the bar at z/N = 7/24 (where the “+” is plotted). However, we do not want to determine the probability of only the actually observed result. After all, for large N, any specific result z can be very improbable. For example, if we flip a fair coin N = 1000 times, the probability of getting exactly z = 500 heads is only 2.5%, even though z = 500 is precisely what we would expect if the coin were fair.

Therefore, instead of determining the probability of getting exactly the result z/N from the null hypothesis, we determine the probability of getting z/N or a result even more extreme than expected from the null hypothesis. The reason for considering more extreme outcomes is this: If we would reject the null hypothesis because the result z/N is too far from what we would expect, then any other result that has an even more extreme value would also cause us to reject the null hypothesis. Therefore we want to know the probability of getting the actual outcome or an outcome more extreme relative to what we expect. This total probability is referred to as “the p value.” The p value defined at this point is the “one-tailed” p value, because it sums the extreme probabilities in only one tail of the sampling distribution. (The term “tail” here refers to the end of a sampling distribution, not to the side of a coin.) In practice, the one-tailed p value is multiplied by 2, to get the two-tailed p value. We consider both tails of the sampling distribution because the null hypothesis could be rejected if the outcome were too extreme in either direction. If this p value is less than a critical amount, then we reject the null hypothesis.

The critical two-tailed probability is conventionally set to 5%. In other words, we will reject the null hypothesis whenever the total probability of the observed z/N or an outcome more extreme is less than 5%. Notice that this decision rule will cause us to reject the null hypothesis 5% of the time when the null hypothesis is true, because the null hypothesis itself generates those extreme values 5% of the time, just by chance. The critical probability, 5%, is the proportion of false alarms that we are willing to tolerate in our decision process. When considering a single tail of the distribution, the critical probability is half of 5%, that is, 2.5%.

Here's the conclusion for our particular case. The actual observation was z/N = 7/24. The one-tailed probability is p = 0.032, which was computed from Equation 11.4, and is shown in Figure 11.3. Because the p value is not less than 2.5%, we do not reject the null hypothesis that θ = 0.5. In NHST parlance, we would say that the result “has failed to reach significance.” This does not mean we accept the null hypothesis; we merely suspend judgment regarding rejection of this particular hypothesis. Notice that we have not determined any degree of belief in the hypothesis that θ = 0.5. The hypothesis might be true or might be false; we suspend judgment.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124058880000118

Principles of Inference

Donna L. Mohr, ... Rudolf J. Freund, in Statistical Methods (Fourth Edition), 2022

3.2.5 Probabilities of Making Errors

If we assume that we have the results of a random sample, we can use the characteristics of sampling distributions presented in Chapter 2 to calculate the probabilities of making either a type I or type II error for any specified decision rule.

Definition 3.6

α: denotes the probability of making a type I error

β: denotes the probability of making a type II error

The ability to provide these probabilities is a key element in statistical inference, because they measure the reliability of our decisions. We will now show how to calculate these probabilities for our examples.

Calculating αfor Example 3.2

The null hypothesis specifies that the probability of drawing a red jelly bean is 0.4 (bowl 2), and the null hypothesis is to be rejected with the occurrence of five red jelly beans. Then the probability of making a type I error is the probability of getting five red jelly beans in a sample of five from bowl 2. If we let Ybe the number of red jelly beans in our sample of five, then

α=P(Y=5whenp=0.4).

The use of the binomial probability distribution (Section 2.3) provides the result α=(0.4)5=0.01024. Thus the probability of incorrectly rejecting a true null hypothesis in this case is 0.01024; that is, there is approximately a 1 in 100 chance that bowl 2 will be mislabeled bowl 1 using the described decision rule.

Calculating αfor Example 3.3

For this example, the null hypothesis was to be rejected if the mean weight was less than 7.9 or greater than 8.1 oz. If is the sample mean weight of 16 jars, the probability of a type I error is

α=P(Y¯<7.9orY¯>8.1whenμ=8).

Assume for now that we know3 that σ, the standard deviation of the population of weights, is 0.2 and that the distribution of weights is approximately normal. If the null hypothesis is true, the sampling distribution of the mean of 16 jars is normal with μ=8and σ=0.2∕16=0.05(see discussion on the normal distribution in Section 2.5). The probability of a type I error corresponds to the shaded area in Fig. 3.1.

Binomial distributions have four mutually exclusive and exhaustive categories.

Figure 3.1. Rejection Region for Sample Mean.

Using the tables of the normal distribution we compute the area for each portion of the rejection region

P(Y¯<7.9)=PZ<7.9−8(0.2∕16)=P(Z<−2.0)=0.0228

and

P(Y¯>8.1)=PZ>8.1−80.2∕16=P(Z>2.0)=0.0228.

Hence

α=0.0228+0.0228=0.0456.

Thus the probability of adjusting the machine when it does not need it (using the described decision rule) is slightly less than 0.05 (or 5%).

Calculating βfor Example 3.2

Having determined αfor a specified decision rule, it is of interest to determine β. This probability can be readily calculated for Example 3.2. Recall that the type II error occurs if we fail to reject the null hypothesis when it is not true. For this example, this occurs if bowl 1 is on the table but we did not get the five red jelly beans required to reject the null hypothesis that bowl 2 is on the table. The probability of a type II error, which is denoted by β, is then the probability of getting four or fewer red jelly beans in a sample of five from bowl 1. If we let Ybe the number of red jelly beans in the sample, then

β=P(Y≤4whenp=0.6).

Using the probability rules from Section 2.2, we know that

P(Y≤4)+P(Y=5)=1.

Since (Y=5)is the complement of (Y≤4),

P(Y≤4)=1−P(Y=5).

Now

P(Y=5)=(0.6)5,

and therefore

β=1−(0.6)5=1−0.07776=0.92224.

That is, the probability of making a type II error in Example 3.2 is over 92%. This value of βis unacceptably large. If bowl 1 is truly on the table, the probability we will be unable to detect it is 0.92!

Calculating βfor Example 3.3

For Example 3.3, H1does not specify a single value for μbut instead includes all values of μ≠8. Therefore calculating the probability of the type II error requires that we examine the probability of the sample mean being outside the rejection region for every value of μ≠8. These calculations and further discussion of βare presented later in this section where we discuss type II errors.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128230435000035

Nonparametrics

Andrew F. Siegel, in Practical Business Statistics�(Sixth Edition), 2012

16.1 Testing the Median against a Known Reference Value

On the one hand, when you have an ordinary univariate sample of data from a population, you might use the average and standard error to test a hypothesis about the population mean (the t test). And this is fine if the distribution is normal.

On the other hand, the nonparametric approach, because it is based on the rank ordering of the data, tests the population median. The median is the appropriate summary because it is defined in terms of ranks. (Remember that the median has rank (1 + n)/2 for a sample of size n.)

How can we get rid of the normal distribution assumption? It's easy once you realize that half of the population is below the median and half is above, if the population distribution is continuous.2 There is a binomial probability distribution inherent here, since the data set is a random sample of independent observations. Using probability language from Chapters 6 and 7Chapter 6Chapter 7, we know the number of data values below the population median is the number of “below-median” events that occur in n independent trials, where each event has probability 1/2. Therefore:

The number of sample data values below a continuous population's median follows a binomial distribution where π = 0.5 and n is the sample size.

The Sign Test

The sign test makes use of this binomial distribution. To test whether or not the population median could reasonably be $65,536, for example, you could see how many sample values fall below $65,536 and determine if this is a reasonable observation from a binomial distribution. The sign test decides whether the population median is equal to a given reference value based on the number of sample values that fall below that reference value. No arithmetic is performed on the data values, only comparing and counting. Here is the procedure:

The Sign Test

1.

Count the number of data values that are different from the reference value, θ0. This number is m, the modified sample size.

2.

Find the limits in the table for this modified sample size.

3.

Count how many data values fall below the reference value, θ0, and compare this number to the limits in the table.3

4.

If the count from step 3 falls outside the limits of the table, the difference is statistically significant. If it falls at or within the limits, the difference is not statistically significant.

The Hypotheses

First, assume that the population distribution is continuous. The null hypothesis for the sign test claims that the population median, θ, is exactly equal to some specified reference value, θ0. (As usual, this reference value is assumed to be known precisely and was not computed from the current data set.) The research hypothesis claims the contrary: The population median is not equal to this reference value.

Hypothesis for the Sign Test for the Median of a Continuous Population Distribution

H0:θ=θ0H1:θ≠θ0

where θ is the (unknown) population median and θ0 is the (known) reference value being tested.

In general, even if the distribution is not continuous, the sign test will decide whether or not your reference value, θ0, divides the population exactly in half:4

Hypotheses for the Sign Test in General

H0: The probability of being above θ0 is equal to the probability of being below θ0 in the population

H1: These probabilities are not equal

where θ0 is the (known) reference value being tested.

The Assumption

There is an assumption required for validity of the sign test. One of the strengths of this nonparametric method is that so little is required for it to be valid.

Assumption Required for the Sign Test

The data set is a random sample from the population of interest.

Table 16.1.1 lists the ranks for the sign test. If m is larger than 100, you would find the table values for level 0.05 by rounding (m−1.960m)/2and (m+1.960m)/2to the nearest whole numbers. For example, for m = 120, these formulas give 49.3 and 70.7, which round to the table values 49 and 71. For level 0.01, you would round (m−2.576m)/2and (m+2.576m)/2.

Table 16.1.1. Ranks for the Sign Test

Modified Sample Size, m5% Test Level1% Test LevelSign Test Is Significant If Number Is EitherSign Test Is Significant If Number Is EitherLess thanorMore thanLess thanorMore than615——716——817179271810281911291101239210133102111431121215411312164123131751231418513414195144152061441621615516226165172371651824717618258176192681871927819720289197212992082130102082231102182332102292333112292434112310243512231025361224102637132411263813251127391326122740142612284114271229421527132943152813304416281430451629143146163014324717301532481731153349183116335018321634511932163552193317355319341736542034183655203518375621351838572136193858223619395922372039602238204061233821406223392141632439214264244022426525402243662541234367264123446826422345692643244570274324467127442546722844254773284526477429452648752946264976294727497730472750783048285079314828518031492951813249295282325029538333503053843351305485335231548634523155873453325588355332568935543257903654335791365533589237553458933756345994385635599538573560963858356197395836619839593662994059376210040603763

Example

Comparing Local to National Family Income

aYour upscale restaurant is considering franchises in new communities. One of the ways you screen is by looking at median family income, because the mean family income might be high due to just a few families. A survey of one community estimated the median family income as $70,547, and you are wondering whether this is significantly higher than the national median family income of $27,735.5 It certainly appears that this community has a higher median income, but with a sample of only 25 families, you would like to be careful before coming to a conclusion. Table 16.1.2 shows the data set, indicating those families with incomes below $27,735.

Table 16.1.2. Incomes of Sampled Families

$39,465$96,270$16,477*$138,93380,80685,4215,921*70,547267,52556,240187,44581,802163,81914,706*83,41478,46458,52554,34836,34625,479*7,081*19,605*29,341137,414156,681

*Income below $27,735.

Your reference value is θ0 = $27,735, a number that is not from the data set itself. Here are the steps involved in performing the sign test:

1.

All 25 families have incomes different from this reference value, so the modified sample size is m = 25, the same as the actual sample size.

2.

The limits from the table for testing at the 5% level are 8 and 17 for m = 25.

3.

There are six families with incomes below the reference value.

4.

Since the number 6 falls outside the limits (i.e., it is less than 8), you reject the null hypothesis and conclude that the result is statistically significant:

The observed median family income of $70,547 for this community is significantly different from the national median family income of $27,735.

What are the 4 conditions of a binomial distribution?

1: The number of observations n is fixed. 2: Each observation is independent. 3: Each observation represents one of two outcomes ("success" or "failure"). 4: The probability of "success" p is the same for each outcome.

How many categories does a binomial distribution have?

The binomial is a type of distribution that has two possible outcomes (the prefix “bi” means two, or twice). For example, a coin toss has only two possible outcomes: heads or tails and taking a test could have two possible outcomes: pass or fail.

Is binomial distribution mutually exclusive?

The underlying assumptions of binomial distribution are that there is only one outcome for each trial, that each trial has the same probability of success, and that each trial is mutually exclusive, or independent of one another.

What are the four distributions?

There are many different classifications of probability distributions. Some of them include the normal distribution, chi square distribution, binomial distribution, and Poisson distribution.