Using BRcal to Assess and Embolden Probability Forecasts

Last Updated June 16, 2024

This vignette demonstrates how to use the BRcal package via a case study involving hockey home team win probability predictions and game outcomes. The core features of this package are (1) probability calibration assessment via bayes_ms() and llo_lrt(), (2) MLE (Maximum Likelihood Estimation) recalibration via mle_recal(), and (3) boldness-recalibration via brcal(). Supporting features include visualizations via lineplot() and plot_params() and the linear in log odds (LLO) function (LLO()) for manual adjustment of probability predictions. For more details on the methodology under the hood of the BRcal package, see Guthrie and Franck (2024). Note that in many cases the output is truncated with ... for readability.

Installation

There are three ways to install the package. The first is to install it directly from GitHub via the snippet below.

# Install via github
devtools::install_github("apguthrie/BRcal")

The second way is to install the package from the most recent tarball available on GitHub. The tarball must first be downloaded and the first argument in the example below must be the file path to the tarball.

# Install via tarball
install.packages("path_to_file\BRcal_0.0.0.9000.tar.gz", repos = NULL, type="source")

The third and simplest way to install the package is directly from CRAN (upon acceptance) via the usual install.packages.

# Install via CRAN 
install.packages("BRcal")

Once the package is installed, it can be loaded with the usual library() call.

library(BRcal)

Illustrative Data Analysis with Core Functionality

In this section, we demonstrate the core functionality of the BRcal package.

Loading the Data

The hockey data we will use throughout this vignette is directly available through the BRcal package in the object called hockey and can be loaded as follows:

data("hockey")

The data contains probability predictions and outcomes for all 868 regular season games in the 2020-21 National Hockey League (NHL) season. The column descriptions are as follows:

The predictions from FiveThirtyEight (2021) can also be found on their website. The random noise forecaster predictions were generated via random uniforms draws from 0.26 to 0.77 (the observed range in the FiveThirtyEight data) to mimic the behavior of a completely uninformed forecaster.

Common Convention Across the BRcal Package

The functions in the BRcal package share a lot of common arguments. Two that we want to explain are x and y, where x is a vector of probabilities predictions of binary events and y is the corresponding true event outcomes. Since we are dealing with probabilities, x must contain numeric values from 0 to 1 (inclusive). And as we are dealing with binary events, y must only take on two values, i.e. the two possible outcomes.

By default, the functions in BRcal expect y to be a vector of 0s and 1s where a 1 represents a “event” and 0 represents a “non-event”. Additionally, these functions expect that the probabilities in x are the probabilities of “event” at each observation. For demonstration of how to pass y as a binary vector that contains values other than 0s and 1s and properly specify event, see the Specifying event Section.

Assessing Calibration via bayes_ms() and llo_lrt()

The BRcal package provides two approaches to assessing calibration: an approximate Bayesian model selection-based approach implemented in bayes_ms() and a likelihood ratio test for calibration in llo_lrt().

Bayesian Model Selection-based approach (bayes_ms())

We’ll first look at the Bayesian model selection-based approach to calibration assessment using bayes_ms(). The snippet below uses this function to assess the calibration of FiveThirtyEight’s predictions in hockey$x.

bt538 <- bayes_ms(hockey$x, hockey$y)
bt538
> $Pmc
> [1] 0.5
> 
> $BIC_Mc
> [1] 1148.637
> 
> $BIC_Mu
> [1] 1157.902
> 
> $BF
> [1] 0.009730538
> 
> $posterior_model_prob
> [1] 0.9903632
> 
> $MLEs
> [1] 0.9453966 1.4005730
> 
> $optim_details
> $optim_details$par
> [1] 0.9453966 1.4005730
> 
> $optim_details$value
> [1] 572.1846
> 
> $optim_details$counts
> function gradient 
>       63       NA 
> 
> $optim_details$convergence
> [1] 0
> 
> $optim_details$message
> NULL

Notice this function returns a list, which we’ve stored in bt538 for later use. The first element in the returned list is Pmc which shows the prior probability of the calibrated model.

The next two elements in the return list are BIC_Mc and BIC_Mu which are the Bayesian Information Criteria (BIC) for the calibrated model and the uncalibrated model, respectively. These are used in the approximation for the Bayes Factor comparing the uncalibrated model to the calibrated model, which is returned in list element BF.

Next, posterior_model_prob is the posterior model probability for the calibrated model given the outcomes y. This serves as the assessment measure of calibration. In this case, since the posterior model probability is high, we consider the predictions from FiveThirtyEight to be well calibrated.

The maximum likelihood estimates (MLEs) for \(\delta\) and \(\gamma\) are returned in element MLEs. These parameters are used throughout the package to govern the adjustment of probability predictions (i.e. MLE-recalibration, boldness-recalibration, and LLO-adjustment). Additionally, whenever MLEs are found for \(\delta\) and \(\gamma\) throughout the package, the optim function is used to perform the optimization.

The final list element is called optim_details, which itself is a list returned by the call to optim() to get the MLEs. Note that we expect to see an NA returned for the number of calls to the gradient function as shown in $optim_details$counts when we use the default algorithm (Nelder-Mead) since it does not use a gradient function. For more details and how to change the call to optim, see the Passing Additional/Different Arguments to optim section.

Setting the prior model probabilities via Pmc

Imagine we wish to assess the extent to which lower prior probabilities of calibration influence results. By default, this function uses assumes equal prior probabilities for the calibrated and uncalibrated models. However, it is important for users to determine what priors make sense for the context of their problem. Argument Pmc allows easy specification of the prior model probability for the calibrated model. Naturally, Pmc must be a value in [0,1] and the prior model probability for the uncalibrated model is taken to be 1 - Pmc.

For example, let’s say it is reasonable to assume that the prior model probability of calibration for FiveThirtyEight is 0.2 instead of 0.5 based on past experience. Then we could use bayes_ms() in the following way.

bayes_ms(hockey$x, hockey$y, Pmc=0.2)
> $Pmc
> [1] 0.2
> 
> $BIC_Mc
> [1] 1148.637
> 
> $BIC_Mu
> [1] 1157.902
> 
> $BF
> [1] 0.009730538
> 
> $posterior_model_prob
> [1] 0.962536
> 
> $MLEs
> [1] 0.9453966 1.4005730
> 
> $optim_details
> $optim_details$par
> [1] 0.9453966 1.4005730
> 
> $optim_details$value
> [1] 572.1846
> 
> $optim_details$counts
> function gradient 
>       63       NA 
> 
> $optim_details$convergence
> [1] 0
> 
> $optim_details$message
> NULL

Note that while many of the elements returned are unchanged (as Pmc is not used in their calculation), BF and posterior_model_prob are different than the previous call.

Likelihood Ratio Test for Calibration (llo_lrt())

Now let’s look at the likelihood ratio test for calibration using llo_lrt(). The null hypothesis for this test is that the predictions x are well calibrated. The alternative is that they are not well calibrated.

The snippet below uses this function to test the calibration of FiveThirtyEight’s predictions in hockey$x and returns a list.

llo_lrt(hockey$x, hockey$y)
> $test_stat
> [1] 4.267411
> 
> $pval
> [1] 0.1183977
> 
> $MLEs
> [1] 0.9453966 1.4005730
> 
> $optim_details
> $optim_details$par
> [1] 0.9453966 1.4005730
> 
> $optim_details$value
> [1] 572.1846
> 
> $optim_details$counts
> function gradient 
>       63       NA 
> 
> $optim_details$convergence
> [1] 0
> 
> $optim_details$message
> NULL

The first element test_stat in the returned list is the test statistic under this likelihood ratio test.

The element pval is the p-value associated with this test and can be used to determine if the predictions in x are well calibrated. In this case, since the p-value is relatively high, we will fail to reject the null that FiveThirtyEight’s predictions are well calibrated, which is consistent with the conclusion from bayes_ms().

The elements in MLEs and optim_details are the same as described in the section for bayes_ms().

MLE Recalibration

To get the MLE recalibrated set (i.e. adjust the probability predictions in xvia the Linear in Log Odds (LLO) function such that calibration is maximized), we provide the mle_recal function. Below is an example of using mle_recal() with the hockey data.

mle_recal(hockey$x, hockey$y)
> $probs
>   [1] 0.5633646 0.6814612 0.5628440 0.5117077 0.5675526 0.4223768 0.3802169
>   [8] 0.4748424 0.5231574 0.3000678 0.5266953 0.5292678 0.4957091 0.5647970
>  [15] 0.6538569 0.4845104 0.4090460 0.5792843 0.6937447 0.6000152 0.3961722
...

> [855] 0.7210362 0.6593455 0.4586214 0.7750603 0.5841900 0.4826292 0.4080026
> [862] 0.6701504 0.6561462 0.4814185 0.7421488 0.6786381 0.3255808 0.4814569
> 
> $MLEs
> [1] 0.9453966 1.4005730
> 
> $optim_details
> $optim_details$par
> [1] 0.9453966 1.4005730
> 
> $optim_details$value
> [1] 572.1846
> 
> $optim_details$counts
> function gradient 
>       63       NA 
> 
> $optim_details$convergence
> [1] 0
> 
> $optim_details$message
> NULL

Specifying probs_only = TRUE

In some cases, you may just want the recalibrated probabilities and without any of the other items in the return list. In this case, you can set probs_only=TRUE and just the vector of probabilities will be returned. Note that it’s still recommended that you check optim_details to verify convergence before using this option. Below is the result of using probs_only=TRUE.

mle_recal(hockey$x, hockey$y, probs_only=TRUE)
>   [1] 0.5633646 0.6814612 0.5628440 0.5117077 0.5675526 0.4223768 0.3802169
>   [8] 0.4748424 0.5231574 0.3000678 0.5266953 0.5292678 0.4957091 0.5647970
>  [15] 0.6538569 0.4845104 0.4090460 0.5792843 0.6937447 0.6000152 0.3961722
...

> [855] 0.7210362 0.6593455 0.4586214 0.7750603 0.5841900 0.4826292 0.4080026
> [862] 0.6701504 0.6561462 0.4814185 0.7421488 0.6786381 0.3255808 0.4814569

Visualizing MLE-recalibrated Probabilities via lineplot()

We can use the lineplot() function to visualize how the predicted probabilities change from the original set to the MLE recalibrated set.

lineplot(hockey$x, hockey$y)

Notice the left hand column of points are the original probability predictions. The original probabilities are connect by a line to the MLE recalibrated probabilities in the right hand column. These points and lines are colored based on the corresponding outcome in y where blue corresponds to an “event” (i.e. a home team win) and red corresponds to a “non-event” (i.e. a home team loss). For more details on modifying this plot and additional features, see the Visualizing Recalibrated Probability Forecasts via lineplot() Section.

Boldness-Recalibration via brcal()

The brcal functions allows users to boldness-recalibrate x (Guthrie and Franck 2024). More specifically, we are maximizing the standard deviation of the predictions while ensuring the posterior model probability of calibration is at least t. By default, brcal performs 95% boldness recalibration (i.e. t is set to 0.95). Below is an example of 90% boldness-recalibration for FiveThirtyEight’s predictions.

br90 <- brcal(hockey$x, hockey$y, t=0.9)
> iteration: 1
>   x = ( 0.945397, 1.400573 )
>   f(x) = -0.123926
>   g(x) = -0.098849
> iteration: 2
>   x = ( 0.945397, 1.400573 )
>   f(x) = -0.123926
>   g(x) = -0.098849
> iteration: 3
>   x = ( 0.937285, 1.478607 )
>   f(x) = -0.130028
>   g(x) = -0.098758
...

> iteration: 175
>   x = ( 0.866252, 2.011972 )
>   f(x) = -0.169010
>   g(x) = 0.000000
> iteration: 176
>   x = ( 0.866252, 2.011972 )
>   f(x) = -0.169010
>   g(x) = 0.000000

Notice there would be quite a bit of output from this call if it were not truncated (4 lines for each of the 176 iterations). This is coming from a call to the nloptr function from the nloptr package under the hood of brcal (Johnson 2007). The nloptr package is an R interface to NLopt, which is an excellent open-source nonlinear optimization library (Johnson 2007). Since both our objective and constraint functions are are nonlinear with respect to parameters \(\delta\) and \(\gamma\), we leverage the nloptr package for the optimization rather than optim(). The output printed at each iteration shows (1) the current values of the parameter estimates, (2) the value of the objective function, and (3) the value of the constraint function.

Next, let’s look at the list returned by brcal(). The first entry nloptr is exactly the object returned from nloptr() when the optimizer stops. This provides useful information about convergence and should always be checked by users to ensure that the algorithm converged properly. Next in the list are Pmc and t, which are just the prior model probability and constraint on probability as specified by the user (or the default values). The solution for the boldness-recalibration parameters is found in BR_params. The achieved maximal spread is sb. Lastly, brcal() returns the vector of boldness-recalibrated probabilities in probs.

br90
> $nloptr
> 
> Call:
> 
> nloptr::nloptr(x0 = c(0.945397, 1.400573), eval_f = obj_f, eval_grad_f = obj_grad_f, 
>     eval_g_ineq = constr_g, eval_jac_g_ineq = constr_grad_g, 
>     opts = list(algorithm = "NLOPT_LD_AUGLAG", maxeval = 500, 
>         maxtime = -1, xtol_rel = 1e-06, print_level = 3, local_opts = list(algorithm = "NLOPT_LD_SLSQP", 
>             eval_grad_f = obj_grad_f, eval_jac_g_ineq = constr_grad_g, 
>             xtol_rel = 1e-06)), t = 0.9, tau = FALSE, Pmc = 0.5, 
>     epsilon = 2.22044604925031e-16)
> 
> 
> Minimization using NLopt version 2.7.1 
> 
> NLopt solver status: 4 ( NLOPT_XTOL_REACHED: Optimization stopped because 
> xtol_rel or xtol_abs (above) was reached. )
> 
> Number of Iterations....: 176 
> Termination conditions:  maxeval: 500 xtol_rel: 1e-06 
> Number of inequality constraints:  1 
> Number of equality constraints:    0 
> Optimal value of objective function:  -0.169010301103792 
> Optimal value of controls: 0.8662523 2.011972
> 
> 
> 
> $Pmc
> [1] 0.5
> 
> $t
> [1] 0.9
> 
> $BR_params
> [1] 0.8662523 2.0119717
> 
> $sb
> [1] 0.1690103
> 
> $probs
>   [1] 0.57521324 0.73683075 0.57447030 0.50109237 0.58118451 0.37458745
>   [7] 0.31759475 0.44828613 0.51755392 0.21761392 0.52264063 0.52633875
>  [13] 0.47812060 0.57725659 0.70072899 0.46208535 0.35630619 0.59785587
...

> [859] 0.60479909 0.45939663 0.35488466 0.72219848 0.70377197 0.45766716
> [865] 0.81088037 0.73320058 0.24804606 0.45772207

Visualizing Boldness-Recalibrated Probabilities via lineplot()

We can again use the lineplot() function to visualize how the predicted probabilities change from the original set to varying levels of boldness-recalibrated sets. Below we visualize 95% and 90% boldness-recalibration. By default, these will also be compared to the MLE-recalibrated set.

lineplot(x=hockey$x, y=hockey$y, t_levels = c(0.95, 0.9))

Notice the left hand column of points are the original probability predictions. The original probabilities are connect by a line to the corresponding MLE-recalibrated, 95%, and 90% boldness-recalibrated probabilities in the right hand columns. These points and lines are colored based on the corresponding outcome in y where blue corresponds to an “event” (i.e. a home team win) and red corresponds to a “non-event” (i.e. a home team loss). Notice that the posterior model probabilities on the x-axis are not in ascending or descending order. Instead, they simply follow the ordering of how one might use the BRcal package: first looking at the original predictions, then maximizing calibration, then examining how far they can spread out predictions while maintaining calibration with boldness-recalibration. For more details on modifying this plot and additional features, see the Visualizing Recalibrated Probability Forecasts via lineplot() Section.

Using tau = TRUE

Sometimes the optimizer is more efficient when optimizing over \(\tau = log(\delta)\) instead of \(\delta\). In this case, users can set tau = TRUE. Specification of start location x0 and bounds lb, ub should still be specified in terms of \(\delta\). The brcal function will automatically convert from \(\delta\) to \(\tau\). Let’s try optimizing over \(\tau\) with the hockey data.

br90_tau <- brcal(hockey$x, hockey$y, t=0.9, tau=TRUE)
> iteration: 1
>   x = ( -0.056151, 1.400573 )
>   f(x) = -0.123926
>   g(x) = -0.098849
> iteration: 2
>   x = ( -0.056151, 1.400573 )
>   f(x) = -0.123926
>   g(x) = -0.098849
> iteration: 3
>   x = ( -0.064262, 1.478607 )
>   f(x) = -0.130024
>   g(x) = -0.098758
...

> iteration: 143
>   x = ( -0.143578, 2.011970 )
>   f(x) = -0.169010
>   g(x) = -0.000002
> iteration: 144
>   x = ( -0.143578, 2.011970 )
>   f(x) = -0.169010
>   g(x) = -0.000002

Notice that the parameter values in the output from nloptr are in terms of \(\tau\) instead of \(\delta\). The same is true of the solution reported in br90_tau$nloptr. In general, all results returned directly from nloptr will reflect whichever scale nloptr() optimized on, whether it be \(\delta\) or \(\tau\). However, BR_params in the returned list will always be reported in terms of \(\delta\).

br90_tau
> $nloptr
> 
> Call:
> 
> nloptr::nloptr(x0 = c(-0.056151, 1.400573), eval_f = obj_f, eval_grad_f = obj_grad_f, 
>     eval_g_ineq = constr_g, eval_jac_g_ineq = constr_grad_g, 
>     opts = list(algorithm = "NLOPT_LD_AUGLAG", maxeval = 500, 
>         maxtime = -1, xtol_rel = 1e-06, print_level = 3, local_opts = list(algorithm = "NLOPT_LD_SLSQP", 
>             eval_grad_f = obj_grad_f, eval_jac_g_ineq = constr_grad_g, 
>             xtol_rel = 1e-06)), t = 0.9, tau = TRUE, Pmc = 0.5, 
>     epsilon = 2.22044604925031e-16)
> 
> 
> Minimization using NLopt version 2.7.1 
> 
> NLopt solver status: 4 ( NLOPT_XTOL_REACHED: Optimization stopped because 
> xtol_rel or xtol_abs (above) was reached. )
> 
> Number of Iterations....: 144 
> Termination conditions:  maxeval: 500 xtol_rel: 1e-06 
> Number of inequality constraints:  1 
> Number of equality constraints:    0 
> Optimal value of objective function:  -0.169010174422865 
> Optimal value of controls: -0.1435781 2.01197
> 
> 
> 
> $Pmc
> [1] 0.5
> 
> $t
> [1] 0.9
> 
> $BR_params
> [1] 0.8662531 2.0119700
> 
> $sb
> [1] 0.1690102
> 
> $probs
>   [1] 0.57521339 0.73683075 0.57447045 0.50109258 0.58118465 0.37458776
>   [7] 0.31759508 0.44828639 0.51755412 0.21761426 0.52264083 0.52633895
>  [13] 0.47812084 0.57725673 0.70072902 0.46208560 0.35630651 0.59785600
...

> [859] 0.60479921 0.45939688 0.35488498 0.72219849 0.70377199 0.45766742
> [865] 0.81088031 0.73320058 0.24804641 0.45772233

Additional Features and Details

In this section, we demonstrate additional features in the BRcal package and discuss additional details that may be useful to more advanced use-cases.

Passing Additional / Different Arguments to nloptr()

The nloptr package is extremely flexible, as it provides numerous arguments to users for tailoring the optimization to the problem at hand. In the nloptr function, most of these arguments are specified via the opts argument, a convention we partially borrow for brcal(). However, we allow users to directly specify a few arguments that are typically relegated to opts for easier adjustments. All other arguments, except those related to the objective and constraint functions, are available to users by passing a list of these arguments opts. We do not allow users to change the objective or constraint functions, or their Jacobian functions, as these are the backbone of this method and the sole purpose for using the brcal function. For those who wish to optimize a different function or use a different constraint, we recommend using the nloptr function directly.

Start Values

While the algorithm used by optim() was not very sensitive to the starting location of \(\delta\) and \(\gamma\), we’ve found the nature of boldness-recalibration makes nloptr() more sensitive to starting location. By default, brcal() initializes the optimization at the MLEs for \(\delta\) and \(\gamma\). While we’ve found this to be the most stable approach to setting a starting location, users can specify their own starting location by specifying x0 and setting start_at_MLEs=FALSE.

To show the sensitivity of this approach to the starting values, try set x0 to \(\delta = 10\), \(\gamma = -5\) like we did with optim(). The line of code below will run brcal() with these starting values, however we do not run it automatically in this vignette as it causes an error. In this case, the algorithm cannot converge and ends up crashing due to trying extreme values of the parameters. We encourage readers to run this code themselves to verify this result.

try(brcal(hockey$x, hockey$y, x0 = c(10, -5), start_at_MLEs = FALSE))
> iteration: 1
>   x = ( 10.000000, -5.000000 )
>   f(x) = -0.264782
>   g(x) = 0.950000
> iteration: 2
>   x = ( 10.000000, -5.000000 )
>   f(x) = -0.264782
>   g(x) = 0.950000
> iteration: 3
>   x = ( 9.994848, -5.044965 )
>   f(x) = -0.266818
>   g(x) = 0.950000
...

> iteration: 27
>   x = ( 3.550802, -324.404849 )
>   f(x) = -0.476697
>   g(x) = 0.950000
> iteration: 28
>   x = ( 4.390822, -445.720424 )
>   f(x) = -0.477751
>   g(x) = 0.950000
> iteration: 29
>   x = ( nan, nan )
>   f(x) = nan
> Error in optim(par = par, fn = llo_lik, ..., x = x, y = y, neg = TRUE,  : 
>   function cannot be evaluated at initial parameters

Algorithm

By default, the brcal function uses the Augmented Lagrangian Algorithm (AUGLAG) (Conn, Gould, and Toint 1991; Birgin and Martínez 2008), which requires an inner optimization routine. We use Sequential Least-Squares Quadratic Programming (SLSQP) (Kraft 1988, 1994) as the inner optimization routine, by default. For a complete list of inner and outer optimization algorithms, see the NLopt website. Note that not all algorithms can be used with a non-linear constraint (which is necessary for boldness-recalibration). Also note that other algorithms may be more sensitive to starting conditions or may need different stopping criteria to converge.

All changes to the inner algorithm are made via passing a sub-list to local_opts, which itself an entry of the list passed to opts. Instead of SLSQP as the inner algorithm, let’s try the method of moving asymptotes (MMA) (Svanberg 2002).

br90_inMMA <- brcal(hockey$x, hockey$y, t=0.9, 
                    opts=list(local_opts=list(algorithm="NLOPT_LD_MMA")))
> iteration: 1
>   x = ( 0.945397, 1.400573 )
>   f(x) = -0.123926
>   g(x) = -0.098849
> iteration: 2
>   x = ( 0.945397, 1.400573 )
>   f(x) = -0.123926
>   g(x) = -0.098849
...

> iteration: 104
>   x = ( 0.866252, 2.011971 )
>   f(x) = -0.169010
>   g(x) = -0.000001
> iteration: 105
>   x = ( 0.866252, 2.011971 )
>   f(x) = -0.169010
>   g(x) = -0.000001
br90_inMMA$nloptr
> 
> Call:
> 
> nloptr::nloptr(x0 = c(0.945397, 1.400573), eval_f = obj_f, eval_grad_f = obj_grad_f, 
>     eval_g_ineq = constr_g, eval_jac_g_ineq = constr_grad_g, 
>     opts = list(local_opts = list(algorithm = "NLOPT_LD_MMA", 
>         eval_f = obj_f, eval_grad_f = obj_grad_f, eval_g_ineq = constr_g, 
>         eval_jac_g_ineq = constr_grad_g, xtol_rel = 1e-06), algorithm = "NLOPT_LD_AUGLAG", 
>         maxeval = 500, xtol_rel = 1e-06, print_level = 3), t = 0.9, 
>     tau = FALSE, Pmc = 0.5, epsilon = 2.22044604925031e-16)
> 
> 
> Minimization using NLopt version 2.7.1 
> 
> NLopt solver status: 4 ( NLOPT_XTOL_REACHED: Optimization stopped because 
> xtol_rel or xtol_abs (above) was reached. )
> 
> Number of Iterations....: 105 
> Termination conditions:  maxeval: 500 xtol_rel: 1e-06 
> Number of inequality constraints:  1 
> Number of equality constraints:    0 
> Optimal value of objective function:  -0.169010270051812 
> Optimal value of controls: 0.8662522 2.011971

Now, let’s try changing the outer algorithm to MMA. This is a great example of an algorithm that is more sensitive to both starting conditions and stopping criteria. In order to get convergence here, we’ll opt to start the optimizer at c(1,2) and relax the relative tolerance. Stopping criteria is discussed in a later section.

br90_outMMA <- brcal(hockey$x, hockey$y, t=0.9, x0=c(1, 2), start_at_MLEs = FALSE,
                     xtol_rel_outer = 1e-05, 
                     opts=list(algorithm="NLOPT_LD_MMA"))
> iteration: 1
>   x = ( 1.000000, 2.000000 )
>   f(x) = -0.166367
>   g(x) = 0.229068
> iteration: 2
>   x = ( 0.938217, 2.023699 )
>   f(x) = -0.168894
>   g(x) = 0.075137
...

> iteration: 15
>   x = ( 0.866256, 2.011972 )
>   f(x) = -0.169010
>   g(x) = 0.000000
> iteration: 16
>   x = ( 0.866238, 2.011969 )
>   f(x) = -0.169010
>   g(x) = -0.000000
br90_outMMA$nloptr
> 
> Call:
> nloptr::nloptr(x0 = c(1, 2), eval_f = obj_f, eval_grad_f = obj_grad_f, 
>     eval_g_ineq = constr_g, eval_jac_g_ineq = constr_grad_g, 
>     opts = list(algorithm = "NLOPT_LD_MMA", maxeval = 500, xtol_rel = 1e-05, 
>         print_level = 3, local_opts = list(algorithm = "NLOPT_LD_SLSQP", 
>             eval_grad_f = obj_grad_f, eval_jac_g_ineq = constr_grad_g, 
>             xtol_rel = 1e-06, eval_f = obj_f, eval_g_ineq = constr_g)), 
>     t = 0.9, tau = FALSE, Pmc = 0.5, epsilon = 2.22044604925031e-16)
> 
> 
> Minimization using NLopt version 2.7.1 
> 
> NLopt solver status: 4 ( NLOPT_XTOL_REACHED: Optimization stopped because 
> xtol_rel or xtol_abs (above) was reached. )
> 
> Number of Iterations....: 16 
> Termination conditions:  maxeval: 500 xtol_rel: 1e-05 
> Number of inequality constraints:  1 
> Number of equality constraints:    0 
> Optimal value of objective function:  -0.169010299710166 
> Optimal value of controls: 0.8662376 2.011969

Note that many of the algorithms do not use an inner algorithm. In most of these cases, local_opts will be ignored, otherwise nloptr() may throw a warning or error.

Bounds

Following the convention of nloptr(), arguments lb and ub allow users to specify the lower and upper bounds of optimization, respectively. The bounds set using lb and ub are used for both the inner and outer algorithms. By default, we only place a lower bound on \(\delta\) requiring \(\delta > 0\). Similar to setting the bounds of optimization in optim(), setting the lower bound to -Inf and the upper bound to Inf will leave the corresponding parameter unbounded.

For example, the code below shows how to bound \(0.25 < \delta < 5\) and leave \(\gamma\) unbounded.

br90_bound <- brcal(hockey$x, hockey$y, t=0.9, lb=c(0.25, -Inf), ub=c(5, Inf))
> iteration: 1
>   x = ( 0.945397, 1.400573 )
>   f(x) = -0.123926
>   g(x) = -0.098849
> iteration: 2
>   x = ( 0.945397, 1.400573 )
>   f(x) = -0.123926
>   g(x) = -0.098849
...

> iteration: 175
>   x = ( 0.866252, 2.011972 )
>   f(x) = -0.169010
>   g(x) = 0.000000
> iteration: 176
>   x = ( 0.866252, 2.011972 )
>   f(x) = -0.169010
>   g(x) = 0.000000
br90_bound$nloptr
> 
> Call:
> 
> nloptr::nloptr(x0 = c(0.945397, 1.400573), eval_f = obj_f, eval_grad_f = obj_grad_f, 
>     eval_g_ineq = constr_g, eval_jac_g_ineq = constr_grad_g, 
>     opts = list(algorithm = "NLOPT_LD_AUGLAG", maxeval = 500, 
>         maxtime = -1, xtol_rel = 1e-06, print_level = 3, local_opts = list(algorithm = "NLOPT_LD_SLSQP", 
>             eval_grad_f = obj_grad_f, eval_jac_g_ineq = constr_grad_g, 
>             xtol_rel = 1e-06)), t = 0.9, tau = FALSE, Pmc = 0.5, 
>     epsilon = 2.22044604925031e-16)
> 
> 
> Minimization using NLopt version 2.7.1 
> 
> NLopt solver status: 4 ( NLOPT_XTOL_REACHED: Optimization stopped because 
> xtol_rel or xtol_abs (above) was reached. )
> 
> Number of Iterations....: 176 
> Termination conditions:  maxeval: 500 xtol_rel: 1e-06 
> Number of inequality constraints:  1 
> Number of equality constraints:    0 
> Optimal value of objective function:  -0.169010301103792 
> Optimal value of controls: 0.8662523 2.011972

Stopping criteria

Since there are quite a few options for stopping criteria that can be set via opts in the nloptr function, we select just a few to specify directly in the brcal function. As with optim(), it is important to check convergence and adjust the stopping criteria accordingly. Keep in mind that stopping criteria also work together so several adjustments may be necessary.

Argument maxeval sets the maximum number of iterations (approximately) allowed by the outer optimization routine. Argument maxtime sets the maximum number of seconds (approximately) the outer algorithm will be allowed to run. Argument xtol_rel_outer and xtol_rel_inner set the relative tolerance for the change in \(\delta\) and \(\gamma\) between iterations of the outer and inner algorithms, respectively. Again, nloptr allows several other options for stopping criteria that can be specified via opts.

The example below shows how to set the maximum number of evaluations to 100, the maximum time to 30 seconds, and the relative tolerance of both the inner and outer algorithms to 0.001. These are not necessarily recommended settings, rather they are intended for demonstration purposes only.

br90_stop <- brcal(hockey$x, hockey$y, t=0.9, maxeval=100, maxtime = 30, 
      xtol_rel_outer = 0.001, xtol_rel_inner = 0.001)
> iteration: 1
>   x = ( 0.945397, 1.400573 )
>   f(x) = -0.123926
>   g(x) = -0.098849
> iteration: 2
>   x = ( 0.945397, 1.400573 )
>   f(x) = -0.123926
>   g(x) = -0.098849
...

> iteration: 82
>   x = ( 0.866252, 2.011972 )
>   f(x) = -0.169010
>   g(x) = -0.000000
> iteration: 83
>   x = ( 0.866252, 2.011972 )
>   f(x) = -0.169010
>   g(x) = -0.000000
br90_stop$nloptr
> 
> Call:
> 
> nloptr::nloptr(x0 = c(0.945397, 1.400573), eval_f = obj_f, eval_grad_f = obj_grad_f, 
>     eval_g_ineq = constr_g, eval_jac_g_ineq = constr_grad_g, 
>     opts = list(algorithm = "NLOPT_LD_AUGLAG", maxeval = 100, 
>         maxtime = 30, xtol_rel = 0.001, print_level = 3, local_opts = list(algorithm = "NLOPT_LD_SLSQP", 
>             eval_grad_f = obj_grad_f, eval_jac_g_ineq = constr_grad_g, 
>             xtol_rel = 0.001)), t = 0.9, tau = FALSE, Pmc = 0.5, 
>     epsilon = 2.22044604925031e-16)
> 
> 
> Minimization using NLopt version 2.7.1 
> 
> NLopt solver status: 4 ( NLOPT_XTOL_REACHED: Optimization stopped because 
> xtol_rel or xtol_abs (above) was reached. )
> 
> Number of Iterations....: 83 
> Termination conditions:  maxeval: 100 maxtime: 30 xtol_rel: 0.001 
> Number of inequality constraints:  1 
> Number of equality constraints:    0 
> Optimal value of objective function:  -0.169010294069083 
> Optimal value of controls: 0.8662524 2.011972

Suppressing output from nloptr()

To minimize the amount of output printed at each iteration of the optimizer, we leverage print_level from nloptr(). By default, we choose to print the maximum amount of information by setting print_level=3. To simply reduce the output, try print_level=1

br90 <- brcal(hockey$x, hockey$y, t=0.9, print_level=1)
> iteration: 1
>   f(x) = -0.123926
> iteration: 2
>   f(x) = -0.123926
> iteration: 3
>   f(x) = -0.130028
...

> iteration: 174
>   f(x) = -0.169010
> iteration: 175
>   f(x) = -0.169010
> iteration: 176
>   f(x) = -0.169010

…or print_level=2.

br90 <- brcal(hockey$x, hockey$y, t=0.9, print_level=2)
> iteration: 1
>   f(x) = -0.123926
>   g(x) = -0.098849
> iteration: 2
>   f(x) = -0.123926
>   g(x) = -0.098849
> iteration: 3
>   f(x) = -0.130028
>   g(x) = -0.098758
...

> iteration: 174
>   f(x) = -0.169010
>   g(x) = 0.000000
> iteration: 175
>   f(x) = -0.169010
>   g(x) = 0.000000
> iteration: 176
>   f(x) = -0.169010
>   g(x) = 0.000000

Or to fully suppress, use print_level=0. Notice nothing prints from the following call.

br90 <- brcal(hockey$x, hockey$y, t=0.9, print_level=0)

Passing Additional / Different Arguments to optim()

While optim() is not used for the non-linear constrained optimization for finding the boldness-recalibration parameters, it is used throughout this package wherever the MLEs are needed, including in the calculation for the posterior model posterior (bayes_ms(), llo_lrt(), mle_recal, brcal(), line_plot(), and plot_params()).

While we’ve found that optim() converges in most cases we’ve tried using the default settings in this package, it is important to double check convergence. It is for this reason that we allow users to pass additional arguments to optim and include the returned list from optim() in several of our own functions so users can see the convergence information. If the algorithm did not converge, which can be checked via list elements optim_details$convergence and optim_details$message from bayes_ms(), llo_lrt(), or mle_recal(), users should adjust the arguments passed to optim() using ... to achieve convergence before trusting results. Below are a few examples of how to adjust the call to optim(). See the documentation for optim() for more information.

Conventions for Passing Arguments to optim()

The BRcal package uses two different conventions for passing arguments to optim() depending on the situation. For the functions where optim() is the only function to which we are passing additional arguments (bayes_ms(), llo_lrt(), and mle_recal()), we pass arguments to optim() using the .... The next few sections demonstrate how to pass arguments in this way. While the examples in this section use bayes_ms(), the same conventions apply to llo_lrt() and mle_recal().

For the functions where we pass additional arguments to multiple functions (brcal(), plot_params(), and lineplot()), we cannot leverage the flexibility of .... Instead, users can pass arguments to optim() in the form of a list via optim_options. Demonstration of passing arguments to optim() in this fashion can be found in Passing Arguments via a List.

Start Values

In some cases, the optimization routine may sensitive to the starting location of the optimizer. By default, all calls to optim() in this package use users can change the starting values of \(\delta\) and \(\gamma\) by passing them as a vector via argument par.

In the following example, we will start the optimizer at \(\delta = 10\), \(\gamma = -5\).

bayes_ms(hockey$x, hockey$y, par = c(10, -5))
> $Pmc
> [1] 0.5
> 
> $BIC_Mc
> [1] 1148.637
> 
> $BIC_Mu
> [1] 1157.902
> 
> $BF
> [1] 0.009730526
> 
> $posterior_model_prob
> [1] 0.9903632
> 
> $MLEs
> [1] 0.9455187 1.4004799
> 
> $optim_details
> $optim_details$par
> [1] 0.9455187 1.4004799
> 
> $optim_details$value
> [1] 572.1846
> 
> $optim_details$counts
> function gradient 
>       61       NA 
> 
> $optim_details$convergence
> [1] 0
> 
> $optim_details$message
> NULL

In this case, the optimizer does not seem overly sensitive to start values, as \(\delta = 10\), \(\gamma = -5\) are not close to the solution (the MLEs) at \(\delta = 0.9455\), \(\gamma = 1.4005\). The algorithm converged and we get the same solution as when we started at \(\delta = 0.5\), \(\gamma = 0.5\).

Stopping Criteria

Sometimes it’s useful to adjust the optimization stopping criteria to improve convergence or refine the solution. We use the default settings in optim(). These can be adjust via the argument control which takes a list with an array of possible components. We defer to the documentation for optim() for further explanation of how to use control. Below is a simple example where we set the relative convergence tolerance (reltol) to 1e-10 instead of the default of sqrt(.Machine$double.eps).

bayes_ms(hockey$x, hockey$y, control=list(reltol=1e-10))
> $Pmc
> [1] 0.5
> 
> $BIC_Mc
> [1] 1148.637
> 
> $BIC_Mu
> [1] 1157.902
> 
> $BF
> [1] 0.009730632
> 
> $posterior_model_prob
> [1] 0.9903631
> 
> $MLEs
> [1] 0.945395 1.401402
> 
> $optim_details
> $optim_details$par
> [1] 0.945395 1.401402
> 
> $optim_details$value
> [1] 572.1846
> 
> $optim_details$counts
> function gradient 
>       83       NA 
> 
> $optim_details$convergence
> [1] 0
> 
> $optim_details$message
> NULL

Notice how we arrive at a slightly different solution and there were more function calls than when using the default.

Algorithm and Bounds

It’s not uncommon that some optimization algorithms work better than others depending on the nature of the problem. We use the default algorithm "Nelder-Mead" (Nelder and Mead 1965) which we’ve found to reliable in most cases we’ve tried. This algorithm does not accept bounds on the parameters. To circumvent the fact that \(\delta > 0\), meaning one of our parameters of interest is bounded, we perform the optimization in terms of \(log(\delta)\). Should users prefer to use a bounded algorithm, the algorithm can be specified using method and the bounds can be specified using lower and upper. For example, let’s say we want to use "L-BFGS-B" where we bound \(0 < \delta < 10\) and \(-1 < \gamma < 25\):

bayes_ms(hockey$x, hockey$y, method = "L-BFGS-B", lower = c(0, -1), upper = c(10, 25))
> $Pmc
> [1] 0.5
> 
> $BIC_Mc
> [1] 1148.637
> 
> $BIC_Mu
> [1] 1157.902
> 
> $BF
> [1] 0.009730632
> 
> $posterior_model_prob
> [1] 0.9903631
> 
> $MLEs
> [1] 0.9453906 1.4013926
> 
> $optim_details
> $optim_details$par
> [1] 0.9453906 1.4013926
> 
> $optim_details$value
> [1] 572.1846
> 
> $optim_details$counts
> function gradient 
>        8        8 
> 
> $optim_details$convergence
> [1] 0
> 
> $optim_details$message
> [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"

Or, if we’d rather only impose the lower bound on \(\delta\) and leave \(\gamma\) unbounded, we could use the following snippet:

bayes_ms(hockey$x, hockey$y, method = "L-BFGS-B", 
         lower = c(0, -Inf), upper = c(Inf, Inf))
> $Pmc
> [1] 0.5
> 
> $BIC_Mc
> [1] 1148.637
> 
> $BIC_Mu
> [1] 1157.902
> 
> $BF
> [1] 0.009730632
> 
> $posterior_model_prob
> [1] 0.9903631
> 
> $MLEs
> [1] 0.9453906 1.4013926
> 
> $optim_details
> $optim_details$par
> [1] 0.9453906 1.4013926
> 
> $optim_details$value
> [1] 572.1846
> 
> $optim_details$counts
> function gradient 
>        8        8 
> 
> $optim_details$convergence
> [1] 0
> 
> $optim_details$message
> [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"

Suppressing the output from optim()

Once you are confident that the algorithm converges via the arguments you’ve specified, you can set optim_details=FALSE to reduce the output if desired. As long as you continue to use the same settings, the convergence should not change. Below is what the output looks like using the hockey data when optim_details=FALSE.

bayes_ms(hockey$x, hockey$y, optim_details=FALSE)
> $Pmc
> [1] 0.5
> 
> $BIC_Mc
> [1] 1148.637
> 
> $BIC_Mu
> [1] 1157.902
> 
> $BF
> [1] 0.009730538
> 
> $posterior_model_prob
> [1] 0.9903632
> 
> $MLEs
> [1] 0.9453966 1.4005730

Passing Arguments via a List

As previously mentioned, the brcal, plot_params, and lineplot functions use optim() under the hood but we choose not to leverage the flexibility of ... since they also pass additional arguments to other functions. For this reason, we instead pass additional arguments to optim() in the form of a list via the parameter optim_options. While we only demonstrate this using brcal() below, the same convention applies to plot_params() and lineplot().

The example below shows how to use optim_options to change the algorithm to "L-BFGS-B", set bounds on \(\delta\) and \(\gamma\), and set the max number of iterations to 200 in the brcal function. Note we specify print_level = 0 and only print the list returned from nloptr() to reduce output.

br <- brcal(hockey$x, hockey$y, t=0.9, print_level = 0,
            optim_options=list(method="L-BFGS-B", 
                               lower=c(0.0001, 10), 
                               upper=c(0.0001, 10), 
                               control=list(maxit=200)))
br$nloptr
> 
> Call:
> 
> nloptr::nloptr(x0 = c(0.945397, 1.400573), eval_f = obj_f, eval_grad_f = obj_grad_f, 
>     eval_g_ineq = constr_g, eval_jac_g_ineq = constr_grad_g, 
>     opts = list(algorithm = "NLOPT_LD_AUGLAG", maxeval = 500, 
>         maxtime = -1, xtol_rel = 1e-06, print_level = 0, local_opts = list(algorithm = "NLOPT_LD_SLSQP", 
>             eval_grad_f = obj_grad_f, eval_jac_g_ineq = constr_grad_g, 
>             xtol_rel = 1e-06)), t = 0.9, tau = FALSE, Pmc = 0.5, 
>     epsilon = 2.22044604925031e-16)
> 
> 
> Minimization using NLopt version 2.7.1 
> 
> NLopt solver status: 4 ( NLOPT_XTOL_REACHED: Optimization stopped because 
> xtol_rel or xtol_abs (above) was reached. )
> 
> Number of Iterations....: 83 
> Termination conditions:  maxeval: 500 xtol_rel: 1e-06 
> Number of inequality constraints:  1 
> Number of equality constraints:    0 
> Optimal value of objective function:  -0.169010282899984 
> Optimal value of controls: 0.8662175 2.011966

While not demonstrated here, the options in the sections above that are passed via ... can also be specified in a list passed to optim_options.

Linear in Log Odds (LLO) function LLO()

The LLO function allows you to apply a linear in log odds (LLO) adjustment to predicted probabilities in x. Specifically it shifts the probabilities (on the log odds scale) by delta and scales them by gamma. Note that the value supplied to delta must be greater than 0 and gamma can be any real number, though very extreme values may cause instabilities (which we’ll see later).

By default, this function is used under the hood of all other functions in this package, specifically when making any adjustment to probabilities (i.e. MLE recalibration, boldness-recalibration). While most users of this package probably will not use the LLO() function directly, there are some cases where users may want to do so. One example of this could be when you have estimates for \(\delta\) and \(\gamma\) based on historical data and want to LLO-adjust unlabeled out of sample predictions. A user could use the LLO function to LLO-adjust the new set of predictions by passing them to function argument x, and the parameter estimates to delta and gamma. The following sections provide examples of how to use the LLO function.

Using LLO() with MLEs or Boldness-Recalibration Estimates

Another example of when users may want to use LLO() directly is in getting the MLE recalibrated set. We have already demonstrated how to go about this using mle_recal(). Alternatively, you can get the MLE recalibrated set manually by first getting the MLEs from another function (such as bayes_ms() or llo_lrt()) and then passing those as delta and gamma in LLO(). This may be useful in cases where you have large data and finding the MLEs is slower than desired. If you already have the MLEs from a previous function call to bayes_ms() or llo_lrt() then you don’t need to take the time to re-solve for the MLEs. Additionally, this approach serves as a good alternative to using mle_recal() with probs_only=TRUE.

In the example below, we’ll use the MLEs for FiveThirtyEight’s predictions we saved from bayes_ms() earlier on and pass those to LLO(). Notice the probabilities below are identical to those we got from mle_recal().

hockey$x_mle <- LLO(x=hockey$x, delta=bt538$MLEs[1], gamma=bt538$MLEs[2])
hockey$x_mle
>   [1] 0.5633646 0.6814612 0.5628440 0.5117077 0.5675526 0.4223768 0.3802169
>   [8] 0.4748424 0.5231574 0.3000678 0.5266953 0.5292678 0.4957091 0.5647970
>  [15] 0.6538569 0.4845104 0.4090460 0.5792843 0.6937447 0.6000152 0.3961722
...

> [855] 0.7210362 0.6593455 0.4586214 0.7750603 0.5841900 0.4826292 0.4080026
> [862] 0.6701504 0.6561462 0.4814185 0.7421488 0.6786381 0.3255808 0.4814569

Let’s visualize what this LLO-adjustment looks like by plotting the original probabilities (hockey$x) on the x-axis and the MLE recalibrated probabilities (hockey$x_mle) on the y-axis. Note that the points below are the actual observations and the curve represents how any prediction along the 0-1 range would behave under delta=0.9453966 and gamma=1.400573.

# plot original vs LLO-adjusted via MLEs
ggplot2::ggplot(data=hockey, mapping=aes(x=x, y=x_mle)) +
  stat_function(fun=LLO,
                args=list(delta=bt538$MLEs[1], 
                          gamma=bt538$MLEs[2]),
                geom="line",
                linewidth=1,
                color="black",
                xlim=c(0, 1)) +
  geom_point(aes(color=winner), alpha=0.75, size=2) +
  lims(x=c(0,1), y=c(0,1)) +
  labs(x="Original", y="LLO(x, 2, 2)", color="Winner") +
  theme_bw()

Similarly, say you have already obtained boldness-recalibration estimates but did not save the vector of boldness-recalibrated probabilities. You can use LLO() to boldness-recalibrate with your estimates without taking the time to let the optimizer in brcal() re-solve for these estimates.

In the example below, we’ll use the 90% boldness-recalibration estimates for FiveThirtyEight’s predictions we saved from brcal() earlier on and pass those to LLO(). This essentially achieves the same probabilities in probs returned by brcal() by LLO-adjusting via the returned boldness-recalibration parameters in BR_params returned by brcal().

LLO(x=hockey$x, br90$BR_params[1], br90$BR_params[2])
>   [1] 0.57521324 0.73683075 0.57447030 0.50109237 0.58118451 0.37458745
>   [7] 0.31759475 0.44828613 0.51755392 0.21761392 0.52264063 0.52633875
>  [13] 0.47812060 0.57725659 0.70072899 0.46208535 0.35630619 0.59785587
...

> [859] 0.60479909 0.45939663 0.35488466 0.72219848 0.70377197 0.45766716
> [865] 0.81088037 0.73320058 0.24804606 0.45772207

Extreme Value of delta or gamma

As previously mentioned, using extreme values of delta or gamma can cause instabilities. For example, let’s see what happens when when we set delta=2, gamma=1000. In the plot below, we show the original probabilities on the x-axis and the LLO-adjusted probabilities via delta=2, gamma=1000 on the y-axis. Notice that nearly all predictions were pushed to approximately 0 or 1.

# LLO-adjust using delta=2, gamma=1000
hockey$x2 <- LLO(hockey$x, delta=2, gamma=1000)

# plot original vs LLO-adjusted via 2,2
ggplot2::ggplot(data=hockey, mapping=aes(x=x, y=x2)) +
  stat_function(fun=LLO,
                args=list(delta=2, gamma=1000),
                geom="line",
                linewidth=1,
                color="black",
                xlim=c(0, 1)) +
  geom_point(aes(color=winner), alpha=0.75, size=2) +
  lims(x=c(0,1), y=c(0,1)) +
  labs(x="Original", y="LLO(x, 2, 1000)", color="Winner") +
  theme_bw()

While this didn’t cause problems plotting the LLO curve, there are cases in this package where probabilities very close to 0 or 1 will cause instability and possibly even break the code. With this in mind, we do have safeguard in place to keep predictions from getting to close (numerically speaking) to 0 or 1, which is discussed further in Specifying epsilon.

Specifying epsilon

When probability predictions that are “too close” to 0 or 1, calculations using the log likelihood become unstable. This is due to the fact that the logit function is not defined at 0 and 1. To mitigate this instability, we use epsilon to specify how close is “too close”. For values in x that are less than epsilon (i.e. too close to zero), these are replaced with epsilon. For values in x greater than 1 - epsilon (i.e. too close to 1), these are replaced with 1 - epsilon. Naturally, epsilon must take on a value between 0 and 1, but it is recommended that this be a very small number. By default, we use .Machine$double.eps.

In this example with the hockey data, there are no values close to 0 or 1 in x or rand, so epsilon is not used at all.

Specifying event

Throughout this package, the functions assume by default that y is a vector of 0s and 1s where a 1 represents a “event” and 0 represents a “non-event”. Additionally, these functions expect that the probabilities in x are the probabilities of “event” at each observation. While switching the labels of “event” and “non-event” will not change the fundamental conclusions, it is good practice to make sure event is defined properly. However, there may be cases where you want to pass a binary vector of outcomes that are not defined as 0s and 1s. To define which of the two values in y should be considered a “success” or the event of interest, users may set argument event to that value.

For example, let’s use the column winner from the hockey dataset as our vector of binary outcomes instead of y in bayes_ms(). We can do this by specifying event = "home". We’ll continue to set optim_details=FALSE for easier reading of the output.

bayes_ms(hockey$x, hockey$winner, event="home", optim_details=FALSE)
> $Pmc
> [1] 0.5
> 
> $BIC_Mc
> [1] 1148.637
> 
> $BIC_Mu
> [1] 1157.902
> 
> $BF
> [1] 0.009730538
> 
> $posterior_model_prob
> [1] 0.9903632
> 
> $MLEs
> [1] 0.9453966 1.4005730

As expected, the results are identical to our previous call.

Maybe you want to approach this problem from the perspective of an event being a home team loss. To do so, you need to pass the predicted probabilities of a home team loss as x = 1 - hockey$x, and specify which value in y is an event via event=0 for y=hockey$y

bayes_ms(1-hockey$x, hockey$y, event=0, optim_details=FALSE)
> $Pmc
> [1] 0.5
> 
> $BIC_Mc
> [1] 1148.637
> 
> $BIC_Mu
> [1] 1157.902
> 
> $BF
> [1] 0.009730605
> 
> $posterior_model_prob
> [1] 0.9903632
> 
> $MLEs
> [1] 1.057957 1.401514

… or event="away" for hockey$winner.

bayes_ms(1-hockey$x, hockey$winner, event="away", optim_details=FALSE)
> $Pmc
> [1] 0.5
> 
> $BIC_Mc
> [1] 1148.637
> 
> $BIC_Mu
> [1] 1157.902
> 
> $BF
> [1] 0.009730605
> 
> $posterior_model_prob
> [1] 0.9903632
> 
> $MLEs
> [1] 1.057957 1.401514

Note that we get the same results using either specification. Additionally, the posterior model probability is the same as when we considered event to be a home team win, as one might expect with binary event predictions.

Visualizing Posterior Model Probability as a Function of \(\delta\) and \(\gamma\) via plot_params()

The goal of the plot_params function is to visualize how the posterior model probability changes under different recalibration parameters, as this is used in boldness-recalibration. Let’s see what the default version of this plot looks like with FiveThirtyEight’s data.

plot_params(hockey$x, hockey$y)

You might immediately notice the imagine is grainy and unnecessarily small. In the following sections, we will demonstrate how to fix these issues and make further adjustments via arguments in plot_params().

Setting bounds and grid density

To adjust the grid of \(\delta\) and \(\gamma\) values, we can use dlim and glim to “zoom in” on the area of non-zero posterior model probabilities. Let’s reduce the range of \(\delta\) and \(\gamma\) so we can better visualize the area of non-zero posterior model probability of calibration.

plot_params(hockey$x, hockey$y, dlim=c(0.5, 1.5), glim=c(0.25, 2.75))

Next, we can use k to form a denser grid (and thus a smoother plotting surface). However, it’s important to note that the larger k values used, the slower this plot will generate.

plot_params(hockey$x, hockey$y, k=200, dlim=c(0.5, 1.5), glim=c(0.25, 2.75))

Saving and Reusing z

Once you’ve settled on the bounds for \(\delta\) and \(\gamma\) you want to plot and the density of the grid via k, you can save the underlying matrix of posterior model probabilities to reuse while making minor plot adjustments. This will save you the hassle of having to recalculate these values, which is where the bottleneck on time occurs. To save this matrix, set return_z=TRUE and store the returned list from plot_params().

zmat_list <- plot_params(hockey$x, hockey$y, k=200, dlim=c(0.5, 1.5), glim=c(0.25, 2.75), 
                         return_z = TRUE)

The returned list is printed below. Note that the matrix is the first entry in the list named z. Additionally, the bounds passed via dlim and glim, and k are returned as a reminder of what values were used to construct z.

zmat_list
> $z
>                           0.25 0.262562814070352 0.275125628140704
> 0.5               7.232601e-35      1.480850e-34      3.016613e-34
> 0.505025125628141 3.785684e-34      7.721926e-34      1.567085e-33
...

> $dlim
> [1] 0.5 1.5
> 
> $glim
> [1] 0.25 2.75
> 
> $k
> [1] 200

While it’s a little hard to see what z looks like in the output above, below shows that z is a 200 by 200 matrix (where 200 is the grid size set by k). The rows and columns are named based on the value of \(\delta\) and \(\gamma\) for which the posterior model probability was calculated.

class(zmat_list$z)
> [1] "matrix" "array"
dim(zmat_list$z)
> [1] 200 200

Now we can reuse the matrix to adjust the plot by passing zmat_list$z to z in plot_params(). However, this trick is only useful so long as you do not expand the bounds over which you want to plot. Below shows the same z matrix plotted on (1) the same range, (2) a smaller range, and (3) a larger range than over which is was calculated. The white that appears in the third frame reflects the fact that the posterior model probability was not calculated for those grid cells since they are outside of the original bounds.

par(mfrow=c(1,3))
plot_params(z=zmat_list$z, k=200, dlim=c(0.5, 1.5), glim=c(0.25, 2.75), main="Same range as z")
plot_params(z=zmat_list$z, k=200, dlim=c(0.7, 1.3), glim=c(0.5, 2.5), main="Smaller range than z")
plot_params(z=zmat_list$z, k=200, dlim=c(1e-04, 5), glim=c(1e-04, 5), main="Larger range than z")

We’ll continue to use this trick throughout to reduce plotting time.

Adding contours and using countors_only=TRUE

To add contours at specific levels of the posterior model probability of calibration, users can specify these levels in a vector via t_levels. Below, we add contours at t = 0.95, 0.9, and 0.8. See (Passing Additional / Different Arguments to image.plot() and contour())[#contour_opts] for guidance on adjusting the contours (i.e. changing color, line width, labels, etc).

plot_params(z=zmat_list$z, t_levels=c(0.95, 0.9, 0.8),
            k=200, dlim=c(0.5, 1.5), glim=c(0.25, 2.75))

To plot the contour surface without the color background, users should specify contours_only=TRUE as shown below. We’ll also add a few more contour levels.

plot_params(z=zmat_list$z, t_levels=c(0.99, 0.95, 0.9, 0.8, 0.7),
            k=200, dlim=c(0.5, 1.5), glim=c(0.25, 2.75), contours_only=TRUE)

Passing Additional / Different Arguments to image.plot() and contour()

While plot_params() obscures the call to image.plot() and contour(), we allow users to leverage their full capabilities via arguments imgplt_options and contour_options. To pass additional arguments to image.plot() and contour(), users can create a list whose entries match the names of the arguments in these functions and pass them to imgplt_options and contour_options, respectively.

For example, below we can move the legend to appear horizontally below the plot using argument horizontal = TRUE in image.plot(). Additionally, we’ll reduce the number of colors used for plotting via nlevel=10 and add a legend label via legend.lab. Notice how these are passed as a list to imgplt_options.

plot_params(z=zmat_list$z, k=200, dlim=c(0.5, 1.5), glim=c(0.25, 2.75),
            imgplt_options=list(horizontal = TRUE, nlevel=10, legend.lab="Posterior Model Prob"))

We can also pass additional arguments to contour() via contour_options. Below, we’ll draw contours at t = 0.99 and 0.1. To make them dotted lines instead of solid, we’ll use lty="dotted", we’ll change the contour color to hot pink, and we’ll increase the line width using lwd=2.

plot_params(z=zmat_list$z, k=200, dlim=c(0.5, 1.5), glim=c(0.25, 2.75), t_levels = c(0.99, 0.1), 
            contour_options=list(lty = "dotted", col="hotpink", lwd=2))

Visualizing Recalibrated Probability Forecasts via lineplot()

The goal of the lineplot function is to visualize how predicted probabilities change under different recalibration parameters. By default this function shows changes in the original probabilities to the MLE recalibrated probabilities. Let’s see what this looks like with the FiveThirtyEight hockey data.

lineplot(x=hockey$x, y=hockey$y)

Note that this function leverages the flexibility of the ggplot2 package, so cosmetic modification and how the plot is returned is a little different than plot_params. We will demonstrate this throughout the following sections.

Specifying Levels of Boldness-Recalibration via t_levels

Similar to specifying contour levels in plot_params, users can specify levels of boldness-recalibration using t_levels in lineplot(). Below we add 95%, 90%, and 80% boldness-recalibration to the plot. Notice in the lineplot for FiveThirtyEight below that the posterior model probabilities on the x-axis are not in ascending or descending order. Instead, they simply follow the ordering of how one might use the BRcal package: first looking at the original predictions, then maximizing calibration, then examining how far they can spread out predictions while maintaining calibration with boldness-recalibration.

lineplot(x=hockey$x, y=hockey$y, t_levels = c(0.95, 0.9, 0.8))

Saving and reusing df

While the time expense of lineplot isn’t as high as plot_params, there is still a call to optim for MLE recalibration and a call to nloptr for each level of boldness-recalibration. Similar to returning the z matrix in plot_params, users of lineplot can specify return_df to return the underlying dataframe used to construct the plot.

By default, return_df=FALSE and lineplot() only returns the ggplot object of the lineplot. When return_df=TRUE, a list is returned with two entries: (1) plot which is the ggplot object of the lineplot and (2) df which is the underlying dataframe. The code below shows how to specify return_df=TRUE and saved the returned list.

lp <- lineplot(x=hockey$x, y=hockey$y, return_df = TRUE, t_levels = c(0.95, 0.9, 0.8))

To extract the plot, we can use the following code.

lp$plot

We can also extract the dataframe using lp$df. The first and last few rows of the dataframe plus a summary of the dataframe are printed below. Notice there are 6 columns, here are their brief descriptions (more details below):

  • probs: the values of each predicted probability under each set
  • outcome: the corresponding outcome for each predicted probability of calibration
  • post: the posterior model probability of the set as a whole
  • id: the id of each probability used for mapping observations between sets
  • set: the set with which the probability belongs to
  • label: the label used for the x-axis in the lineplot
lp$df
>           probs outcome      post  id       set                  label
> 1    0.55528238       1 0.9903632   1  Original   Original \n(0.99036)
> 2    0.64177575       1 0.9903632   2  Original   Original \n(0.99036)
> 3    0.55490924       1 0.9903632   3  Original   Original \n(0.99036)
> 4    0.51837525       0 0.9903632   4  Original   Original \n(0.99036)
> 5    0.55828545       0 0.9903632   5  Original   Original \n(0.99036)
> 6    0.45427667       0 0.9903632   6  Original   Original \n(0.99036)
...

> 4336 0.45558625       1 0.8000000 864   80% B-R         80% B-R\n(0.8)
> 4337 0.81615909       1 0.8000000 865   80% B-R         80% B-R\n(0.8)
> 4338 0.73767174       1 0.8000000 866   80% B-R         80% B-R\n(0.8)
> 4339 0.24187950       0 0.8000000 867   80% B-R         80% B-R\n(0.8)
> 4340 0.45564257       0 0.8000000 868   80% B-R         80% B-R\n(0.8)
summary(lp$df)
>      probs         outcome       post              id               set     
>  Min.   :0.09135   0:2025   Min.   :0.8000   Min.   :  1.0   80% B-R  :868  
>  1st Qu.:0.43415   1:2315   1st Qu.:0.9000   1st Qu.:217.8   90% B-R  :868  
>  Median :0.53261            Median :0.9500   Median :434.5   95% B-R  :868  
>  Mean   :0.53226            Mean   :0.9278   Mean   :434.5   MLE Recal:868  
>  3rd Qu.:0.63269            3rd Qu.:0.9904   3rd Qu.:651.2   Original :868  
>  Max.   :0.91671            Max.   :0.9989   Max.   :868.0                  
>                   label      
>  Original \n(0.99036)  :868  
>  MLE Recal. \n(0.99885):868  
>  95% B-R\n(0.95)       :868  
>  90% B-R\n(0.9)        :868  
>  80% B-R\n(0.8)        :868  
> 

The probs column contains each set of predicted probabilities that are plotted (i.e. the original, MLE recalibrated, and each set of boldness-recalibrated probabilities). The outcome column contains the corresponding outcome for each value in probs. Each set of probabilities and outcomes are essentially “stacked” on top of each other. The id column contains a unique id for each observation to map how observations change between sets. This essentially tells the plotting function how to connect (with line) the same observation as is changes from the original set to MLE- or boldness-recalibration. The set column contains a factor for which set that observation belongs to. The post column contains the value of the posterior model probability for the set as a whole, so this value will be the same for all observations in the same set. Lastly, the label column is a string that is used to label the x-axis for each set in the lineplot and is also the same for all observations in a set.

Since this dataframe has each set of probabilities stacked, the number of rows in the dataframe will be \(\text{# of sets } \times n\), where \(n\) is the number of observations per set. For example, in the data frame above, there are 5 sets (original, MLE, 95%, 90%, and 80% boldness-recalibration) with 868 observations total. So the length of the dataframe is 5 * 868 = 4340. This is confirmed below.

nrow(lp$df)
> [1] 4340

Now with this dataframe saved, we can pass it back via df without needing to specify x or y and we get the same plot as before, but much more quickly. This functionality makes it easier to make cosmetic adjustments to your plot without needing to wait for computations under the hood.

lineplot(df=lp$df)

Cosmetic Adjustments

A key difference in how this plot is created compared to plot_params() is that it uses ggplot2 instead of base R graphics. Users of lineplot() have quite a few options for making cosmetic changes to these plots. The first set of options we’ll discuss are those related to the points and lines already present on the lineplot. To modify the appearance of the points, users can pass options to geom_point() via a list to ggpoint_options. For example, the code below shows how to change the shape and size of the points.

lineplot(df=lp$df, ggpoint_options=list(size=3, shape=1))

lineplot(df=lp$df, ggline_options=list(linewidth=2, alpha=0.1))

An important thing to note about these two options is that their purpose is solely for adjustments that are not part of the aes statement in geom_point() or geom_line(). Should users specify aes options via ggpoint_options or ggline_options, this will cause ggplot() to create a new layer of points or lines on top of those following the default settings we use under the hood.

Another option for making cosmetic adjustments is to use the convention of ggplot2 and add additional ggplot2 functions to the returned lineplot using +. For example, the code below shows how we could add the scale_color_manual() function to our plot to change the colors of the points and lines. Notice that ggplot2 throws a warning about another color scale already being present. This is because under the hood we have already specified the default color scheme to be red and blue. A similar warning may appear in any case where a user adds a function that is already present in our original construction of the lineplot.

lineplot(df=lp$df) + scale_color_manual(values = c("orchid", "darkgreen"))
> Scale for colour is already present.
> Adding another scale for colour, which will replace the existing scale.

Thinning

Another strategy to save time when plotting is to reduce, or “thin”, the amount of data plotted. When sample sizes are large, the plot can become overcrowded and slow to plot. We provide three options for thinning:

  • thin_to: the observations in (x,y) are randomly sampled without replacement to form a set of size thin_to
  • thin_percent: the observations in (x,y) are randomly sampled without replacement to form a set that is thin_percent * 100% of the original size of (x,y)
  • thin_by: the observations in (x,y) are thinned by selecting every thin_by observation

By default, all three of these settings are set to NULL, meaning no thinning is performed. Users can only specify one thinning strategy at a time. Care should be taken in selecting a thinning approach based on the nature of your data and problem. Note that MLE recalibration and boldness-recalibration will be done using the full set, so the posterior model probabilities are those of the full set.

Below is an example of each of these strategies. Notice that each plot looks a little different since the observations selected are a little different under each strategy.

lp1 <- lineplot(df=lp$df, thin_to=500)
lp2 <- lineplot(df=lp$df, thin_percent=0.5)
lp3 <- lineplot(df=lp$df, thin_by=2)
gridExtra::grid.arrange(lp1, lp2, lp3, ncol=3)

By default, there is a seed set so that the observations selected stay the same through successive calls using the same thinning strategy. To change the seed, users can specify their own via seed. Notice in the example below that we’re using the same strategy as panel one in the above set of plots but the points are slightly different.

lineplot(df=lp$df, thin_to=500, seed = 47)

Alternatively, if users would prefer no seed be set, they can set seed=NULL. In the plot below, we again have a different set of points and lines. In fact, each compilation of this vignette will have a different set.

lineplot(df=lp$df, thin_to=500, seed=NULL)

Passing Additional Arguments to optim and nloptr

Similar to the plot_params function, this function allows users to specify additional arguments to optim via optim_options. Additionally, we also allow users to pass arguments to nloptr via nloptr_options in a similar fashion. In this case, the list passed via nloptr_options is passed to opts in the brcal function. For more information on how to structure this list, see Passing Additional / Different Arguments to nloptr().

References

Birgin, Ernesto G., and José Mario Martínez. 2008. “Improving Ultimate Convergence of an Augmented Lagrangian Method.” Optimization Methods and Software 23: 177–95. https://doi.org/10.1080/10556780701577730.
Conn, Andrew R., Nicholas I. M. Gould, and Philippe L. Toint. 1991. “A Globally Convergent Augmented Lagrangian Algorithm for Optimization with General Constraints and Simple Bounds.” SIAM J. Numer. Anal. 28 (2): 545–72. https://doi.org/10.1137/0728030.
FiveThirtyEight. 2021. “2020-21 NHL Predictions.” fivethirtyeight.com. https://data.fivethirtyeight.com/.
Guthrie, Adeline P., and Christopher T. Franck. 2024. “Boldness-Recalibration for Binary Event Predictions.” The American Statistician 0 (0): 1–11. https://doi.org/10.1080/00031305.2024.2339266.
Johnson, Steven G. 2007. “The NLopt Nonlinear-Optimization Package.” https://github.com/stevengj/nlopt.
Kraft, Dieter. 1988. A Software Package for Sequential Quadratic Programming. Deutsche Forschungs- Und Versuchsanstalt für Luft- Und Raumfahrt köln: Forschungsbericht. Wiss. Berichtswesen d. DFVLR. https://books.google.com/books?id=4rKaGwAACAAJ.
———. 1994. “Algorithm 733: TOMP–Fortran Modules for Optimal Control Calculations.” ACM Trans. Math. Softw. 20 (3): 262–81. https://doi.org/10.1145/192115.192124.
Nelder, J. A., and R. Mead. 1965. A Simplex Method for Function Minimization.” The Computer Journal 7 (4): 308–13. https://doi.org/10.1093/comjnl/7.4.308.
Svanberg, Krister. 2002. “A Class of Globally Convergent Optimization Methods Based on Conservative Convex Separable Approximations.” SIAM Journal on Optimization 12 (January): 555–73. https://doi.org/10.1137/S1052623499362822.