Skip to content

Export srs_diff_est()#340

Open
vinniott wants to merge 11 commits intostan-dev:masterfrom
vinniott:export-srs-diff-est
Open

Export srs_diff_est()#340
vinniott wants to merge 11 commits intostan-dev:masterfrom
vinniott:export-srs-diff-est

Conversation

@vinniott
Copy link
Contributor

Fixes #333

Note:

  • There is no @example yet because I did not have time yet to get fully familiar with the whole package.
  • I updated NEWS.md as suggested in CONTRIBUTING.md but I am not sure whether I did that correctly.

@vinniott vinniott mentioned this pull request Mar 22, 2026
@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.78%. Comparing base (7eafeb8) to head (5d476b5).

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #340   +/-   ##
=======================================
  Coverage   92.78%   92.78%           
=======================================
  Files          31       31           
  Lines        2992     2992           
=======================================
  Hits         2776     2776           
  Misses        216      216           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jgabry
Copy link
Member

jgabry commented Mar 23, 2026

Thank you @vinniott.

There is no @example yet because I did not have time yet to get fully familiar with the whole package.

@avehtari or @MansMeg, is there any specific example you'd like to use for this in the documentation?

@avehtari
Copy link
Member

The example should be based on flexible enough model so that elpd(log_lik_matrix) and loo(log_lik_matrix) differ more than by 1. The current example_loglik_matrix() has too few observations. The example used in tests for subsampling is one parameter model. Should we have a real model, or store another example loglik matrix?

After we have useful loglik matrix, the example code would be something like

# Use posterior predictive density as the fast but biased method for all observations
lpd <- elpd(log_lik_matrix)
sum(lpd$pointwise[,"elpd"])

# Use PSIS-LOO for subsample of 50 randomly selected observations
idx <- sample(1:N, 50)
elpd_loo_sub <- loo(log_lik_matrix[,idx])
20 * sum(elpd_loo_sub$pointwise[,"elpd_loo"])

# Use difference estimator to combine fast result and subsampled accurate result
loo:::srs_diff_est(lpd$pointwise[,"elpd"], elpd_loo_sub$pointwise[,"elpd_loo"], idx)

# Comparison to using PSIS-LOO for all observations
loo(log_lik_matrix)

This matches what someone was asking

@jgabry
Copy link
Member

jgabry commented Mar 23, 2026

Should we have a real model, or store another example loglik matrix?

Either is fine by me. Also if we're only using it for this example, we could also just generate an example loglik matrix in the example code instead of storing it.

@avehtari
Copy link
Member

I think the interesting examples can be slow to run. I'll test subsampling with few interesting real models this week

@avehtari
Copy link
Member

avehtari commented Mar 27, 2026

Thus would be a good example with data from https://archive.ics.uci.edu/ml/datasets/wine+quality

library(dplyr)
library(brms)
options(brms.backend = "cmdstanr")
options(mc.cores = 4)
library(loo)

wine <- read.delim(root("winequality-red", "winequality-red.csv"), sep = ";") |>
  distinct()

wine_scaled <- as.data.frame(scale(wine))

fitos <- brm(ordered(quality) ~ .,
            family = cumulative("logit"),
            prior = prior(R2D2(mean_R2 = 1/3, prec_R2 = 3)),
            data = wine_scaled,
            seed = 1,
            silent = 2,
            refresh = 0)

log_lik_matrix <- log_lik(fitos)

N <- nrow(wine_scaled)
Nsub <- 100

# posterior log-score
lpd <- elpd(log_lik_matrix)
sum(lpd$pointwise[,"elpd"])

# Use PSIS-LOO for subsample of Nsub randomly selected observations
set.seed(1)
idx <- sample(1:N, Nsub)
elpd_loo_sub <- loo(log_lik_matrix[,idx])
sum(elpd_loo_sub$pointwise[,"elpd_loo"]) / Nsub * N

# Use difference estimator to combine fast result and subsampled accurate result
loo:::srs_diff_est(lpd$pointwise[,"elpd"], elpd_loo_sub$pointwise[,"elpd_loo"], idx)

# Comparison to using PSIS-LOO for all observations
loo(log_lik_matrix)
  • p_loo is here about 17 and thus posterior log-score is clearly different
  • N is 1359, so that a subsample of 100 is still only small part of all observations
  • No high Pareto-k values to complicate things
  • Subsampling with Nsub gets close to the full result

As compiling and sampling the brms model takes some time, I would store only the log_lik_matrix but show the code for how it is generated. The rest of code is fast

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

export srs_diff_est

4 participants