Review: Margins & Graph Design (Stata)

Margins

When we are talking about margins, we are really talking about two separate commands we can do within this one command: predicted values OR marginal effects. These two commands help us illustrate the effects that we have found in our model for our reader. It can help convey effect size or interactions. If you master the margins command, you can use it to highlight a particular finding for your reader. You can list the values or marginal effects as numbers or plot it. I will display each of the examples via plots.

What is a predicted value?
Remember, all the models we run are based on equations that are trying to predict our dependent variable (aka our outcome - Y). Once we run the model, we actually get an equation we can use. If we plug in specific values for our independent variables (our X's), then we can predict the value of our outcome. In a linear regression, that outcome is some continuous variable: income, years in the NBA, life expectancy, etc. In a regression with a binary outcome, the predicted value is actually a predicted probability. In a regression with a count outcome, the predicted value is, you guessed it, a count.

To get a predicted value out of the margins command, you don't have to add anything special! This is the default for the margins command.

What is a marginal effect?
A marginal effect tells us how much our outcome (Y) changes based on a one unit change in X. This hopefully sounds familiar to you. That's because it's how we interpret coefficients in a linear regression. Basically, a marginal effect is a slope. In a linear regression, the slope is constant. A one unit change in X causes the same change in Y for any X value. In other models, the slope changes as you go up the range of X. It's actually the instantaneous rate of change, meaning the slope at a particular point. This is harkening back to your calculus classes. In the image below, the dashed line shows you the slope at the point where the arrow ends.

To get a marginal effect out of the margins command, you need to include dydx() to your margins command. That's that calculus terminology.

Now what makes the margins command difficult is that there are so so many options we can add to the command. Do we want to focus on one of our independent variables or two? Are we holding the other variables at means or at representative values? Are we working with continuous or categorical variables? This overview today is hopefully going to help you understand the grammar of this command and how to bend it to your will when you need to use it on your project.

There are four decisions you will need to make when producing margins. These four decisions form the basics of the many variations of the margins command. All of these examples will be from a logistic regression so they will be predicted probabilities or changes in probability.

To use the margins command, we need to run a regression first. This is the regression we'll be basing the following commands off of:

                                                logistic                  survived i.port i.female log_fare parch                          

Decision 1: Categorical or Continuous?

The first decision you will need to make when using margins, is what independent (X) variable to focus on. Remember, you as the analyst are using the margins or margins plot to visualize some finding, usually relating to a key independent variable in your analysis. For example, in our titanic analysis our key variable was port of boarding. With the margins command, the syntax for the command will change depending on whether your X variable of focus is categorical or continuous.

For a categorical variable, the variable name will be listed BEFORE the comma if other options are specified or don't include any comma at all.

Here is the margins command and margins plot, focusing on port of entry.

For a continuous variable, the variable name will be listed AFTER the comma inside of an at() option statement and you must specify which values of X Stata should calculate/plot the margins for.

Here is the margins command and margins plot, focusing on ticket fare (logged to address skew).

                                  margins,                    at(log_fare = (0 2 4 6))                  marginsplot                              

Decision 2: Add a second variable?

The second decision you will need to make is whether you want to add a second variable to focus on. Any more than two X variables of focus will lead to a complicated plot that doesn't communicate much to your reader. But adding a second variable can help you show not only the effect of your first variable, but how that effect varies across groups. It can illustrate interactions. Again, whether your variables are continuous or categorical will affect how you do this command.

Continuous and Continuous You will specify both variables AFTER the comma inside of the same at() and you must specify the multiple values of each variable to calculate/plot. The first variable will plot on the X axis and the second variable will be plotted as different lines.

Here is the margins command and margins plot for ticket fare and number of parents/children on board.

                                  margins,                    at(log_fare = (0 2 4 6) parch = (0 3 6))                  marginsplot                              

Continuous and Categorical You will specify the categorical variable BEFORE the comma and the continuous variable AFTER the comma within the at() option statement. If you only want to include certain categories, you can move the categorical variable into the at() statement and specify the categories to include.

Here is the margins command and margins plot for ticket fare and port of boarding.

                                  margins port,                    at(log_fare = (0 2 4 6))                  marginsplot                              

And here is the same plot, but with only Cherbourg and Queenstown.

                                  margins,                    at(log_fare = (0 2 4 6) port = (1 2))                  marginsplot                              

Categorical and Categorical You will specify the variable you want on the X axis BEFORE the comma and the variable you want split across different lines AFTER the comma in the over() optional statement.

Here is the margins command margins plot for port of boarding and sex of passenger with port on the X axis and sex as lines.

                                  margins port,                    over(female)                  marginsplot                              

Decision 3: How will you handle the other X variables?

This is a review from last week. There are three approaches to handling the other X variables (aka the ones that are NOT the X variable you want to highlight):

  • Hold other variables AT MEANS
  • Hold other variables AT REPRESENTATIVE VALUES
  • Run everything with observed values and compute the AVERAGE predicted value/effect.

NOTE: For these I want you to note how the actual predicted values change, so I am not going to include the margins plot. Look at the actual calculated margins for each approach.

At means: You just have to add an at means after the comma

Here are the margins for port with the other variables at means.

              Adjusted predictions                                       Number of obs = 889 Model VCE: OIM  Expression: Pr(survived), predict() At: 1.port   = .1889764 (mean)     2.port   = .0866142 (mean)     3.port   = .7244094 (mean)     0.female = .6490439 (mean)     1.female = .3509561 (mean)     log_fare = 2.959024 (mean)     parch    = .3824522 (mean)  ------------------------------------------------------------------------------              |            Delta-method              |     Margin   std. err.      z    P>|z|     [95% conf. interval] -------------+----------------------------------------------------------------         port |   Cherbourg  |    .453225    .049525     9.15   0.000     .3561577    .5502923  Queenstown  |   .3466006   .0648861     5.34   0.000     .2194261    .4737751 Southampton  |   .3286141     .02262    14.53   0.000     .2842797    .3729486 ------------------------------------------------------------------------------            

At representative values: You have to specify the specific values of the other variables within the at() option, but only ONE value per variable.

Here are the margins for port with the other variables at representative values.

                                  margins port,                    at(female = 1 log_fare = 2 parch = 2)                              
              Adjusted predictions                                       Number of obs = 889 Model VCE: OIM  Expression: Pr(survived), predict() At: female   = 1     log_fare = 2     parch    = 2  ------------------------------------------------------------------------------              |            Delta-method              |     Margin   std. err.      z    P>|z|     [95% conf. interval] -------------+----------------------------------------------------------------         port |   Cherbourg  |   .5240158   .0825546     6.35   0.000     .3622117    .6858199  Queenstown  |   .4133268   .0845754     4.89   0.000      .247562    .5790915 Southampton  |   .3939654   .0596048     6.61   0.000     .2771421    .5107888 ------------------------------------------------------------------------------            

Average predicted values/marginal effects: Don't do a damn thing! This is the default.

Here are the average predicted probabilities for port of boarding.

              Predictive margins                                         Number of obs = 889 Model VCE: OIM  Expression: Pr(survived), predict()  ------------------------------------------------------------------------------              |            Delta-method              |     Margin   std. err.      z    P>|z|     [95% conf. interval] -------------+----------------------------------------------------------------         port |   Cherbourg  |   .4495649    .033027    13.61   0.000     .3848331    .5142967  Queenstown  |   .3781069   .0442857     8.54   0.000     .2913085    .4649054 Southampton  |   .3657791   .0155409    23.54   0.000     .3353196    .3962387 ------------------------------------------------------------------------------            

Decision 4: Predicted values or Marginal effects

And finally, we can specify whether we want the predicted values or the marginal effects. All of the above examples were predicted values. Specifically they were predicted probabilities because we're working with logistic regression.

Predicted values: Again, don't do a damn thing! This is the default for the margins command.

Here again are the average predicted probabilities at different values of ticket fare (logged).

                                  margins,                    at(log_fare = (2 4 6))                                                
              Predictive margins                                         Number of obs = 889 Model VCE: OIM  Expression: Pr(survived), predict() 1._at: log_fare = 2 2._at: log_fare = 4 3._at: log_fare = 6  ------------------------------------------------------------------------------              |            Delta-method              |     Margin   std. err.      z    P>|z|     [95% conf. interval] -------------+----------------------------------------------------------------          _at |           1  |   .2676895   .0207842    12.88   0.000     .2269532    .3084257           2  |   .5118238   .0262607    19.49   0.000     .4603537    .5632939           3  |   .7744947   .0526051    14.72   0.000     .6713907    .8775987 ------------------------------------------------------------------------------            

Marginal effects: When you do marginal effects, you move your variable of focus, regardless of whether it is categorical or continuous, AFTER the comma inside the dydx() option statement.

Here is the marginal effect for ticket fare (logged).

              Average marginal effects                                   Number of obs = 889 Model VCE: OIM  Expression: Pr(survived), predict() dy/dx wrt:  log_fare  ------------------------------------------------------------------------------              |            Delta-method              |      dy/dx   std. err.      z    P>|z|     [95% conf. interval] -------------+----------------------------------------------------------------     log_fare |   .1111977   .0158007     7.04   0.000     .0802289    .1421665 ------------------------------------------------------------------------------            

You can get one marginal effect for a continuous variable OR get the marginal effects across the range if you so choose:

                                  margins,                    dydx(log_fare)                    at(log_fare = (2 4 6))                              
              Average marginal effects                                   Number of obs = 889 Model VCE: OIM  Expression: Pr(survived), predict() dy/dx wrt:  log_fare 1._at: log_fare = 2 2._at: log_fare = 4 3._at: log_fare = 6  ------------------------------------------------------------------------------              |            Delta-method              |      dy/dx   std. err.      z    P>|z|     [95% conf. interval] -------------+---------------------------------------------------------------- log_fare     |          _at |           1  |   .1043584   .0131838     7.92   0.000     .0785187    .1301981           2  |   .1365486   .0224777     6.07   0.000      .092493    .1806041           3  |   .1116513   .0047204    23.65   0.000     .1023996    .1209031 ------------------------------------------------------------------------------            

Basics of Graphing Aesthetics

This lab will not review in detail all of the many different ways you can change the graphs you produce in Stata. However, I will provide you with this reference do file for graph formatting in Stata. It covers:

  • Titles, subtitles, and captions
  • Changing axis and tick mark options
  • Color of markers, lines, and the fill area
  • Style of markers and lines
  • Background colors for the plot area and graph area
  • Labelling specific values

Download formatref.do

Here is a resource just to learn more about adjusting plots with grstyle:
Download grstyle_StataGraphsMadeEasy.pdf

Good Graph Design

Example: A journal ready predicted values plot

Here is one example of a journal ready plot. This is code you can tweak and save to be your default style when producing plots for papers. I pulled this shortcut formatting from various places online, and it is in the reference .do file I put above.

We will use the grstyle shortcuts for changing the default settings for graphs. This is going to do the bulk of the formatting work for you, and would make your plot style consistent if you wanted to make several graphs for the same paper.

                                  * Install some helper commands (if                    you haven't already installed them)                                      net install http://www.stata-journal.com/software/sj18-3/gr0073/                                                        ssc                    install grstyle                                      ssc                    install palettes                                      ssc                    install colrspace                                    * Set the                    new                    default                    styles                    for                    your                    graph                                    grstyle                    clear                    // clear the grstyle settings                                                        set                    scheme                    s2color                    // sets the color scheme                                    grstyle init                    // initiates the gr style command                                    grstyle                    set                    plain, box                    // create a plain style, with a box around the plot                                                        * 'grid' is another good option you can play with                  grstyle                    color                    background                    white                    // turn the background white                                    grstyle                    set                    color                    mono                    // set a monochrome color scheme for points/lines                                                        grstyle                    set                    color                    Dark2, n(3)                    // set a color scheme for colored lines/points                                                        * other                    color                    schemes                    include                    Set1, Set2, RdYlGn, Dark1, Dark2                                      *                    I                    set                    the number                    of                    colors                    I                    need                    in                    the plot                    in                    ', n(3)'                  grstyle yesno draw_major_hgrid yes                    // include major grid lines                                    grstyle yesno draw_major_ygrid yes                    // include major grid lines                                    grstyle                    set                    legend                    10, box inside                    // move the legend inside the plot area                                                        *                    for                    this                    last                    command the number after                    legend                    refers to the corner                                                        *                    of                    the plot (1 to                    at                    least 12 like a clock).                    I                    wanted it                    in                    the                    upper                                                        * left.                    I                    just played around with numbers til                    I                    got it where                    I                    wanted.                  grstyle                    set                    size                    14pt: axis_title                    // change the size of axis text                                                                    

I will calculate the margins quietly:

                                                      quietly                    margins,                    at(log_fare = (0 2 4 6) parch = (0 3 6))                              

And then here is the final plot. I don't have to change many options, because all the grstyle options I used above did that work. I left plot title blank, because often you will manually title plots in your manuscript. You can add a title in if you wish.

                                  marginsplot,                    ///                                                        title("")                    ///                                                                            ytitle("Predicted Probabilities")                    ///                                                        xtitle("Ticket Fare (logged)")                    ///                                                        legend                    (order(1                    "0 parents/children"                    2                    "3 parents/children"                    ///                                                        3                    "6 parents/children"))                                                                          graph                    export                    figs_output/probplot.png                    // save plot to your files                                                

Here is a resource just to learn more about adjusting plots with grstyle:
Download grstyle_StataGraphsMadeEasy.pdf

Lab Assignment

From this logistic regression, produce a journal ready predicted probability plot with two X variables of focus (you can choose any two X variables from the regression). You can play around with grstyle or the direct commands from the reference file.