Logistic regression models a relationship between predictor variables and a categorical response variable. For example, we could use logistic regression to model the relationship between various measurements of a manufactured specimen (such as dimensions and chemical composition) to predict if a crack greater than 10 mils will occur (a binary variable: either yes or no). Logistic regression helps us estimate a probability of falling into a certain level of the categorical response given a set of predictors. We can choose from three types of logistic regression, depending on the nature of the categorical response variable: Show
Binary Logistic Regression: Used when the response is binary (i.e., it has two possible outcomes). The cracking example given above would utilize binary logistic regression. Other examples of binary responses could include passing or failing a test, responding yes or no on a survey, and having high or low blood pressure. Nominal Logistic Regression: Used when there are three or more categories with no natural ordering to the levels. Examples of nominal responses could include departments at a business (e.g., marketing, sales, HR), type of search engine used (e.g., Google, Yahoo!, MSN), and color (black, red, blue, orange). Ordinal Logistic Regression: Used when there are three or more categories with a natural ordering to the levels, but the ranking of the levels do not necessarily mean the intervals between them are equal. Examples of ordinal responses could be how students rate the effectiveness of a college course on a scale of 1-5, levels of flavors for hot wings, and medical condition (e.g., good, stable, serious, critical). Particular issues with modelling a categorical response variable include nonnormal error terms, nonconstant error variance, and constraints on the response function (i.e., the response is bounded between 0 and 1). We will investigate ways of dealing with these in the binary logistic regression setting here. There is some discussion of the nominal and ordinal logistic regression settings in Section 15.2. The multiple binary logistic regression model is the following: \(\begin{align}\label{logmod} where here \(\pi\) denotes a probability and not the irrational number 3.14....
For a sample of size n, the likelihood for a binary logistic regression is given by: \(\begin{align*} This yields the log likelihood: \(\begin{align*} Maximizing the likelihood (or log likelihood) has no closed-form solution, so a technique like iteratively reweighted least squares is used to find an estimate of the regression coefficients, \(\hat{\beta}\). To illustrate, consider data published on n = 27 leukemia patients. The Leukemia Remission data set has a response variable of whether leukemia remission occurred (REMISS), which is given by a 1. The predictor variables are cellularity of the marrow clot section (CELL), smear differential percentage of blasts (SMEAR), percentage of absolute marrow leukemia cell infiltrate (INFIL), percentage labeling index of the bone marrow leukemia cells (LI), absolute number of blasts in the peripheral blood (BLAST), and the highest temperature prior to start of treatment (TEMP). The following gives the estimated logistic regression equation and associated significance tests from Minitab:
This results in the following output: Coefficients
Wald TestThe Wald test is the test of significance for individual regression coefficients in logistic regression (recall that we use t-tests in linear regression). For maximum likelihood estimates, the ratio \(\begin{equation*} can be used to test \(H_{0} \colon \beta_{i}=0\). The standard normal curve is used to determine the \(p\)-value of the test. Furthermore, confidence intervals can be constructed as \(\begin{equation*} Estimates of the regression coefficients, \(\hat{\beta}\), are given in the Minitab output Coefficients table in the column labeled "Coef." This table also gives coefficient p-values based on Wald tests. The index of the bone marrow leukemia cells (LI) has the smallest p-value and so appears to be closest to a significant predictor of remission occurring. After looking at various subsets of the data, we find that a good model is one which only includes the labeling index as a predictor: Coefficients
Regression Equationp(1) = exp(Y')/(1 + exp(Y')) Since we only have a single predictor in this model we can create a Binary Fitted Line Plot to visualize the sigmoidal shape of the fitted logistic regression curve: Odds, Log Odds, and Odds RatioThere are algebraically equivalent ways to write the logistic regression model: The first is \(\begin{equation}\label{logmod1} which is an equation that describes the odds of being in the current category of interest. By definition, the odds for an event is \(\pi\) / (1 - π) such that π is the probability of the event. For example, if you are at the racetrack and there is a 80% chance that a certain horse will win the race, then his odds are 0.80 / (1 - 0.80) = 4, or 4:1. The second is \(\begin{equation}\label{logmod2} which states that the (natural) logarithm of the odds is a linear function of the X variables (and is often called the log odds). This is also referred to as the logit transformation of the probability of success, \(\pi\). The odds ratio (which we will write as \(\theta\)) between the odds for two sets of predictors (say \(\textbf{X}_{(1)}\) and \(\textbf{X}_{(2)}\)) is given by \(\begin{equation*} For binary logistic regression, the odds of success are: \(\begin{equation*} By plugging this into the formula for \(\theta\) above and setting \(\textbf{X}_{(1)}\) equal to \(\textbf{X}_{(2)}\) except in one position (i.e., only one predictor differs by one unit), we can determine the relationship between that predictor and the response. The odds ratio can be any nonnegative number. An odds ratio of 1 serves as the baseline for comparison and indicates there is no association between the response and predictor. If the odds ratio is greater than 1, then the odds of success are higher for higher levels of a continuous predictor (or for the indicated level of a factor). In particular, the odds increase multiplicatively by \(\exp(\beta_{j})\) for every one-unit increase in \(\textbf{X}_{j}\). If the odds ratio is less than 1, then the odds of success are less for higher levels of a continuous predictor (or for the indicated level of a factor). Values farther from 1 represent stronger degrees of association. For example, when there is just a single predictor, \(X\), the odds of success are: \(\begin{equation*} If we increase \(X\) by one unit, the odds ratio is \(\begin{equation*} To illustrate, the relevant Minitab output from the leukemia example is: Odd Ratios for Continuous Predictors
The odds ratio for LI of 18.1245 is calculated as \(\exp(2.89726)\) (you can view more decimal places for the coefficient estimates in Minitab by clicking "Storage" in the Regression Dialog and selecting "Coefficients"). The 95% confidence interval is calculated as \(\exp(2.89726\pm z_{0.975}*1.19)\), where \(z_{0.975}=1.960\) is the \(97.5^{\textrm{th}}\) percentile from the standard normal distribution. The interpretation of the odds ratio is that for every increase of 1 unit in LI, the estimated odds of leukemia remission are multiplied by 18.1245. However, since the LI appears to fall between 0 and 2, it may make more sense to say that for every .1 unit increase in L1, the estimated odds of remission are multiplied by \(\exp(2.89726\times 0.1)=1.336\). Then
Notice that \(1.336\times 0.232=0.310\), which demonstrates the multiplicative effect by \(\exp(0.1\hat{\beta_{1}})\) on the odds. Likelihood Ratio (or Deviance) TestThe likelihood ratio test is used to test the null hypothesis that any subset of the \(\beta\)'s is equal to 0. The number of \(\beta\)'s in the full model is p, while the number of \(\beta\)'s in the reduced model is r. (Remember the reduced model is the model that results when the \(\beta\)'s in the null hypothesis are set to 0.) Thus, the number of \(\beta\)'s being tested in the null hypothesis is \(p-r\). Then the likelihood ratio test statistic is given by: \(\begin{equation*} where \(\ell(\hat{\beta})\) is the log likelihood of the fitted (full) model and \(\ell(\hat{\beta}^{(0)})\) is the log likelihood of the (reduced) model specified by the null hypothesis evaluated at the maximum likelihood estimate of that reduced model. This test statistic has a \(\chi^{2}\) distribution with \(p-r\) degrees of freedom. Minitab presents results for this test in terms of "deviance," which is defined as \(-2\) times log-likelihood. The notation used for the test statistic is typically \(G^2\) = deviance (reduced) – deviance (full). This test procedure is analogous to the general linear F test procedure for multiple linear regression. However, note that when testing a single coefficient, the Wald test and likelihood ratio test will not, in general, give identical results. To illustrate, the relevant software output from the leukemia example is: Deviance Table
Since there is only a single predictor for this example, this table simply provides information on the likelihood ratio test for LI (p-value of 0.004), which is similar but not identical to the earlier Wald test result (p-value of 0.015). The Deviance Table includes the following:
When using the likelihood ratio (or deviance) test for more than one regression coefficient, we can first fit the "full" model to find deviance (full), which is shown in the "Error" row in the resulting full model Deviance Table. Then fit the "reduced" model (corresponding to the model that results if the null hypothesis is true) to find deviance (reduced), which is shown in the "Error" row in the resulting reduced model Deviance Table. For example, the relevant Deviance Tables for the Disease Outbreak example on pages 581-582 of Applied Linear Regression Models (4th ed) by Kutner et al are: Full model
Reduced Model
Here the full model includes four single-factor predictor terms and five two-factor interaction terms, while the reduced model excludes the interaction terms. The test statistic for testing the interaction terms is \(G^2 = 101.054-93.996 = 7.058\), which is compared to a chi-square distribution with \(10-5=5\) degrees of freedom to find the p-value > 0.05 (meaning the interaction terms are not significant). Alternatively, select the corresponding predictor terms last in the full model and request Minitab to output Sequential (Type I) Deviances. Then add the corresponding Sequential Deviances in the resulting Deviance Table to calculate \(G^2\). For example, the relevant Deviance Table for the Disease Outbreak example is:
The test statistic for testing the interaction terms is \(G^2 = 4.570+1.015+1.120+0.000+0.353 = 7.058\), the same as in the first calculation. Goodness-of-Fit TestsOverall performance of the fitted model can be measured by several different goodness-of-fit tests. Two tests that require replicated data (multiple observations with the same values for all the predictors) are the Pearson chi-square goodness-of-fit test and the deviance goodness-of-fit test(analagous to the multiple linear regression lack-of-fit F-test). Both of these tests have statistics that are approximately chi-square distributed with c - p degrees of freedom, where c is the number of distinct combinations of the predictor variables. When a test is rejected, there is a statistically significant lack of fit. Otherwise, there is no evidence of lack of fit. By contrast, the Hosmer-Lemeshow goodness-of-fit test is useful for unreplicated datasets or for datasets that contain just a few replicated observations. For this test the observations are grouped based on their estimated probabilities. The resulting test statistic is approximately chi-square distributed with c - 2 degrees of freedom, where c is the number of groups (generally chosen to be between 5 and 10, depending on the sample size). To illustrate, the relevant Minitab output from the leukemia example is: Goodness-of-Fit Test
Since there is no replicated data for this example, the deviance and Pearson goodness-of-fit tests are invalid, so the first two rows of this table should be ignored. However, the Hosmer-Lemeshow test does not require replicated data so we can interpret its high p-value as indicating no evidence of lack-of-fit. \(R^{2}\)The calculation of \(R^{2}\) used in linear regression does not extend directly to logistic regression. One version of \(R^{2}\) used in logistic regression is defined as \(\begin{equation*} where \(\ell(\hat{\beta_{0}})\) is the log likelihood of the model when only the intercept is included and \(\ell_{S}(\beta)\) is the log likelihood of the saturated model (i.e., where a model is fit perfectly to the data). This \(R^{2}\) does go from 0 to 1 with 1 being a perfect fit. With unreplicated data, \(\ell_{S}(\beta)=0\), so the formula simplifies to: \(\begin{equation*} To illustrate, the relevant Minitab output from the leukemia example is: Model Summary
Recall from above that \(\ell(\hat{\beta})=-13.0365\) and \(\ell(\hat{\beta}^{(0)})=-17.1859\), so: \(\begin{equation*} Note that we can obtain the same result by simply using deviances instead of log-likelihoods since the \(-2\) factor cancels out: \(\begin{equation*} Raw ResidualThe raw residual is the difference between the actual response and the estimated probability from the model. The formula for the raw residual is \(\begin{equation*} Pearson ResidualThe Pearson residual corrects for the unequal variance in the raw residuals by dividing by the standard deviation. The formula for the Pearson residuals is \(\begin{equation*} Deviance ResidualsDeviance residuals are also popular because the sum of squares of these residuals is the deviance statistic. The formula for the deviance residual is \(\begin{equation*} Here are the plots of the Pearson residuals and deviance residuals for the leukemia example. There are no alarming patterns in these plots to suggest a major problem with the model. Hat ValuesThe hat matrix serves a similar purpose as in the case of linear regression – to measure the influence of each observation on the overall fit of the model – but the interpretation is not as clear due to its more complicated form. The hat values (leverages) are given by \(\begin{equation*} where W is an \(n\times n\) diagonal matrix with the values of \(\hat{\pi}_{i}(1-\hat{\pi}_{i})\) for \(i=1 ,\ldots,n\) on the diagonal. As before, we should investigate any observations with \(h_{i,i}>3p/n\) or, failing this, any observations with \(h_{i,i}>2p/n\) and very isolated. Studentized ResidualsWe can also report Studentized versions of some of the earlier residuals. The Studentized Pearson residuals are given by \(\begin{equation*} and the Studentized deviance residuals are given by \(\begin{equation*} C and \(\bar{\textrm{C}}\)C and \(\bar{\textrm{C}}\) are extensions of Cook's distance for logistic regression. C measures the overall change in fitted logits due to deleting the \(i^{\textrm{th}}\) observation for all points including the one deleted, while \(\bar{\textrm{C}}\) excludes the deleted point. They are defined by: \(\begin{equation*} and \(\begin{equation*} To illustrate, the relevant Minitab output from the leukemia example is: Fits and Diagnostics for Unusual Observations
Fits and Diagnostics for Unusual Observations
R Large residual The default residuals in this output (set under Minitab's Regression Options) are deviance residuals, so observation 8 has a deviance residual of \(-1.945\), a studentized deviance residual of \(-2.11\), a leverage (h) of \(0.149840\), and a Cook's distance (C) of 0.58. DFDEV and DFCHIDFDEV and DFCHI are statistics that measure the change in deviance and in Pearson's chi-square, respectively, that occurs when an observation is deleted from the data set. Large values of these statistics indicate observations that have not been fitted well. The formulas for these statistics are \(\begin{equation*} and \(\begin{equation*} Which type of promotional activities refers to non personal paid communication that utilizes media?Publicity is the form of public relations that entails nonpersonal communication passed on via various media but not paid for an identified sponsor.
Which promotional method is most likely to be used in a push policy?A push strategy tries to sell directly to the consumer, bypassing other distribution channels. An example of this would be selling insurance or holidays directly. With this type of strategy, consumer promotions and advertising are the most likely promotional tools.
What is promotional advertising quizlet?Promotional advertising. Advertising designed to increase sales. Institutional advertising. The process used to try to create a favorable image for a company and foster goodwill in the marketplace.
Which of the following is the major goal of integrated marketing communication IMC in regard to its marketing messages?The goal of integrated marketing communications is to ensure that customers receive the same message when they interact with a company's brand in various ways.
|