“datasets are often highly structured, containing clusters of non-independent observational units that are hierarchical in nature, and Linear Mixed Models allow us to explicitly model the non-independence in such data”
(Harrison et al., 2018) [1]

They allow modeling of data measured on different levels at the same time – for instance students nested within classes and schools – thus taking complex dependency structures into account
(Bürkner, 2018) [2]

These models inherently regularize estimates towards a central value to an extent that depends on the heterogeneity of the underlying groups.
(Green & Thomas, 2019) [3]

The regularizing aspects of the ‘partial pooling’ inherent in the structure of these models (averaging parameter estimates between in-group and across-subject loadings) help to mitigate the impacts of multiple comparisons by pulling an estimate towards its central tendency to the extent warranted by the between-group variance
(Green & Thomas, 2019) [3]


This post will be an investigation into various types of regression models. The intent is to gain an intuition across a multitude of techniques and the context for this will be assessing stock valuation. A particular focus will be on multi-level models and ensuring robustness in the presence of outliers.

In order to do this we need a model, and I’m going to use the P/B - ROE model. This model relates the ratio of market capitalisation over book value to return on equity. We expect the regression co-efficient on ROE is positive, indicating that holding book value constant, a higher ROE leads to a higher valuation.

Regression techniques to be assessed will cover linear models estimated using ordinary least squares (on a pooled and un-pooled basis), multi-levels models (estimated using both frequentist and Bayesian approaches), and as mentioned above, non-parametric robust technique. Multi-level models will be extended to account for measurement error in predictor variables, a useful technique that Bayesian estimation allows us to implement. We will also model the response with a non-normal link functions, using the Student’s t distribution to account for outlying data points.

As alluded to in the quotes above, the focus will be on multi-level models. Why this particular focus? Multi-level models purport to provide robust results by virtue of partial pooling. Partial pooling results in regularisation of parameter estimates via the sharing of information across groups or clusters of observations. Stocks naturally fall into groups based on industry membership. This makes multi-level models a highly relevant technique in the modeling of stock valuations.

Robust regression techniques also purport to provide robust results (it’s all in the name isn’t it). We will see how these techniques, multi-level parametric and robust non-parametric, compare.

The Theil-Sen regression is a non-parametric robust technique. Theil-Sen has been discussed here, and is a robust technique that works to dampen the impact of outliers. It does so whereby the slope is derived taking the median of many individual slopes, those being fitted to each pair of data points.

Two approaches that have a similar aim, but take different routes to model data. It will be interesting to review the difference in fit between models that pool data and embed an underlying distributional assumption, and those that are non-parametric and use robust techniques.

# Libraries
library(dplyr)
library(tidyr)
library(DBI)
library(RPostgres)
library(DescTools)
library(lubridate)
library(mblm)
library(romerb)
library(recipes)
library(lme4)
library(rethinking)
library(brms)
library(DT)

data("stock_data")
fundamental_raw <- stock_data
rm(stock_data)

# Set default theme
def_theme1 <- theme_minimal() +
  theme(
    legend.title = element_blank(),
    legend.position = c(0.9,0.9),
    legend.background = element_blank(),
    legend.key = element_blank(),
    plot.caption = element_text(size = 8, color = "grey55", face = 'italic'), 
    axis.title.y = element_text(size = 8, color = "darkslategrey"),
    axis.title.x = element_text(size = 8, color = "darkslategrey"),
    axis.text.y = element_text(size = 7, color = "darkslategrey"),
    axis.text.x = element_text(size = 7, color = "darkslategrey")
    )

The data

The data under analysis comprises around 750 individual companies. These companies are grouped into 11 sectors and 40 odd industries. The data is as of June 2021 and comes from the Securites and Exchange Commission via my Stock_master database.

Below are some variables of interest. Our dependent variable is log_pb, the independent variable is roe.

data <- fundamental_raw %>% 
  filter(date_stamp == as.Date('2021-06-30')) %>% 
  mutate(
    log_mkt_cap = log(mkt_cap),
    log_assets = log(total_assets),
    log_equity_cln = log(-total_equity_cln),
    roe = -roe,
    roe_s = scale(roe),
    leverage_s = scale(leverage),
    sector = as.factor(sector),
    industry = as.factor(industry)
  ) %>% 
  select(date_stamp, symbol, log_mkt_cap, log_assets, log_equity_cln, roe, roe_s, leverage, leverage_s, vol_ari_60d, sector, industry, log_pb)

data %>% 
  pivot_longer(
    cols = c(log_pb, log_mkt_cap, roe, log_assets, log_equity_cln, leverage),
    names_to = 'attribute', 
    values_to = 'value'
    ) %>% 
  ggplot(aes(value)) + 
  geom_histogram(bins = 45) + 
  facet_wrap(vars(attribute), scales = 'free') +
  def_theme1

Some definitions, leverage is debt over assets, log_assets is the natural logarithm of total assets, log_equity_cln is the natural logarithm of total equity when equity is positive and the natural logarithm of 10% of assets when equity is negative, and roe is return on equity.

Aggregate analysis

We start applying our model to the full data set, ignoring the underlying sector structure.

The plot below shows the relationship between the log price/book ratio and ROE for all stocks regardless of sector or industry membership. The blue line is the regression line fitted using OLS, the grey line that fitted using the Theil-Sen robust estimator, and magenta is a Generalised Additive Model with a spline smooth.

# For Theil Sen line https://stackoverflow.com/questions/48349858/how-can-i-use-theil-sen-method-with-geom-smooth
sen <- function(..., weights = NULL) {
  mblm::mblm(...)
}

data %>% 
  ggplot(aes(x = roe, y = log_pb)) +
  geom_point(alpha = 0.3) +
  stat_smooth(method = 'gam', se = FALSE, formula = y ~ s(x), size = 0.6, colour = 'magenta') +
  geom_smooth(method = lm, se = FALSE, size = 0.6, color = "#3366FF") +
  geom_smooth(method = sen, se = FALSE, size = 0.6, colour = 'grey') +
  labs(
    title = 'Log book / price ratio versus return on equity',
    x = 'Return on equity',
    y = 'Log price / book ratio'
  ) +
  theme(
    plot.caption = element_text(size = 8, margin = margin(t = 10), color = "grey40", hjust = 0),
    axis.text.x = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks = element_blank()
  ) +
  def_theme1

# Add predictions to df
data$lm_agg_pred <- predict(lm(log_pb ~ roe, data = data))
data$ts_agg_pred <- predict(mblm(log_pb ~ roe, data = data))
data$gam_agg_pred <- predict(mgcv::gam(log_pb ~ s(roe), data = data))

The Theil-Sen regression seems to fit the data better (at least on a rough eyeballing of the plot), fitting the dense cluster of data points more closely. As expected, the slope of the line is positive, indicating all else equal, a higher valuation given higher return on equity.

There is a hint of non-linearity in the relationship between the price / book ratio and ROE, and this is captured by the GAM.

Unpooled analysis

Stocks with similar characteristics are grouped into sectors. It is reasonable to expect that sectors will have different characteristics giving rise to differing relationships between ROE and valuation. Different sectors may have different growth prospects and risk profiles for example.

With this in mind, the plot below shows the same regression models estimated above (the GAM has been removed), applied individually to each sector.

# Model coefficients for OLS and Thiel Sen regression
model_coefs <- data %>% 
  group_by(date_stamp, sector) %>% 
  filter(n() > 1) %>% 
  nest() %>% 
  mutate(
    fit_ols = purrr::map(.x = data, .f = ~lm(log_pb ~ roe, data = .x)),
    fit_ts = purrr::map(.x = data, .f = ~mblm(log_pb ~ roe, data = .x, repeated = TRUE)),
    int_ols = purrr::map_dbl(.x = fit_ols, .f = function(x) coef(summary(x))['(Intercept)', 'Estimate']),
    slp_ols = purrr::map_dbl(.x = fit_ols, .f = function(x) coef(summary(x))['roe', 'Estimate']),
    int_ts = purrr::map_dbl(.x = fit_ts, .f = function(x) coef(summary(x))['(Intercept)', 'Estimate']),
    slp_ts = purrr::map_dbl(.x = fit_ts, .f = function(x) coef(summary(x))['roe', 'Estimate']),
    x_min = purrr::map_dbl(.x = data, .f = ~min(.x$roe)),
    x_max = purrr::map_dbl(.x = data, .f = ~max(.x$roe)),
    y_min = int_ts + slp_ts * x_min,
    y_max = int_ts + slp_ts * x_max
  ) %>% 
  select(-fit_ols, -fit_ts, -data) %>%
  ungroup()

# Predictions
data <- merge(data, model_coefs, by.x = 'sector', by.y = 'sector')
data$ts_ind_pred <- data$int_ts + data$roe * data$slp_ts
data$ols_ind_pred <- data$int_ols + data$roe * data$slp_ols
data <- subset(data, select = -c(date_stamp.y, x_min, y_min, x_max, y_max))
names(data)[names(data) == 'date_stamp.x'] <- 'date_stamp'

# Facet labeller
sector_labels <- as_labeller(c(
  "1" = "Industrials",
  "2" = "Technology",
  "3" = "Consumer Defensive",
  "4" = "Consumer Cyclical",
  "5" = "Financial Services",
  "6" = "Utilities",
  "7" = "Healthcare",
  "8" = "Energy",
  "9" = "Business Services",
  "10" = "Real Estate",
  "11" = "Basic Materials"))

# Plot
p1 <- data %>% 
  ggplot(aes(x = roe, y = log_pb)) +
  facet_wrap(~reorder(sector, as.numeric(sector)), ncol = 4, scales = 'free', labeller = sector_labels) + 
  geom_point(alpha = 0.3) +
  geom_smooth(method = lm, se = FALSE, size = 0.6, color = "#3366FF") + 
  geom_segment(
    aes(x = x_min, xend = x_max, y = y_min, yend = y_max),
    alpha = 0.6,
    data = filter(model_coefs, date_stamp == as.Date('2021-06-30')),
    colour = 'grey40'
  ) +
  labs(
    title = 'Log book / price ratio versus return on equity by industry',
    subtitle = 'Estimated with independent OLS (blue) and independent Theil-Sen (grey)',
    x = 'Return on equity',
    y = 'Log price / book ratio'
  ) +
  def_theme1

p1

How has the line of best fit changed? Once again we might argue the robust regression fits the data better. Witness the Industrials sector. The data points distinct from the diagonal cluster skew the OLS derived slope, effectively pulling the left hand side of the line higher. The resultant slope is shallower than that derived using the Theil-Sen regression.


Below, we visualise the intercepts and slopes for each of the individual linear models.

The numbers assigned to sectors in this plot (and all that follow), follow the order above, i.e. 1 is Industrials and 11 is Basic Materials.

model_coefs_all <- bind_rows(
  data.frame(
    sector = as.factor(model_coefs$sector), 
    intercept = model_coefs$int_ols, 
    slope = model_coefs$slp_ols, 
    source = rep("OLS", 11)
    ),
  data.frame(
    sector = as.factor(model_coefs$sector), 
    intercept = model_coefs$int_ts, 
    slope = model_coefs$slp_ts, 
    source = rep("TS", 11)
    ) 
  )

model_coefs_all %>% 
  filter(source %in% c('OLS','TS')) %>% 
  ggplot(aes(x = intercept, y = slope, colour = source)) +
  geom_point(alpha = 0.7) + ylim(-1.25, 5.5) +
  geom_text(aes(label = sector), nudge_x = 0.03, nudge_y = 0.05, check_overlap = TRUE, show.legend = FALSE, size = 3) +
  labs(
    title = 'Regression intercept and slope',
    subtitle = 'Estimated with independent OLS and Theil-Sen',
    x = 'Intercept',
    y = 'Slope'
  ) +
  def_theme1 +
  scale_color_manual(values=c("OLS"="#3366FF", "TS"="grey40"))

A couple of the OLS models have negative slopes, and this is counter to expectation. Consistent with the observations above, the Theil-Sen slopes are more compact or closer to each other.


Unpooled analysis - LSDV

We now add a Least Squares Dummy Variable regression. The plot below is presented with fixed scales to demonstrate the identical slopes that the LSDV structure enforces.

data1 <- data %>% 
  select(date_stamp, sector, log_pb, log_mkt_cap, roe, log_assets, log_equity_cln, leverage)

rec <- recipe(log_pb ~ roe + sector, data = data1)

lsdv_data <- rec %>% step_dummy(sector, keep_original_cols = TRUE) %>% prep(training = data) %>% bake(new_data = NULL)

lsdv_mdl <- lm(log_pb ~ roe + sector_X2 + sector_X3 + sector_X4 + 
               sector_X5 + sector_X6 + sector_X7 + sector_X8 + 
               sector_X9 + sector_X10 + sector_X11, 
               data = lsdv_data)

lsdv_data$lsvd_pred <- predict(lsdv_mdl)
data$lsvd_pred <- lsdv_data$lsvd_pred

# Plot
p2 <- p1 + 
  geom_line(aes(x = roe, y = lsvd_pred), data = lsdv_data, size = 0.6, color = "black") +
  labs(subtitle = 'Estimated with independent OLS (blue), independent Theil-Sen (grey) and LSDV (black)') +
  facet_wrap(~reorder(sector, as.numeric(sector)), ncol = 4, scales = 'fixed', labeller = sector_labels) +
  def_theme1

p2

The plot is a little harder to read with the constant scale. We can see however that the LSDV forces an unnatural fit. Financial Services for example has a much smaller slope than would appear warranted. This is driven by the influence of other sectors that have a large slope. Financials are in effect an outlier among sectors and the rigidity of the model does not allow the data to influence this sectors slope.

Utilities has a comparatively compact range of returns on equity and valuations, this is consistent with a regulated industry. If a group of businesses profitability and growth is capped, variation amongst returns is constrained.


Intercepts and slopes for individual linear models by sector, and the LSDV model.

lsdv_mdl_coef <- as.data.frame(coef(lsdv_mdl)) %>% 
  tibble::rownames_to_column(var = 'intercept') %>% 
  mutate(slope = unname(coef(lsdv_mdl)['roe']))
lsdv_mdl_coef[1, 1] <- 1
lsdv_mdl_coef <- lsdv_mdl_coef[lsdv_mdl_coef$intercept != 'roe', ]
lsdv_mdl_coef$intercept <- as.factor(gsub("[^0-9.]", "",  lsdv_mdl_coef$intercept))
lsdv_mdl_coef$source <- 'LSDV'
names(lsdv_mdl_coef)[1:2] <- c('sector','intercept')

lm_mdl_coefs <- model_coefs %>% 
  filter(date_stamp == as.Date('2021-06-30')) %>% 
  select(sector, int_ols, slp_ols) %>% 
  mutate(sector = as.factor(sector), source = 'OLS') %>% 
  rename(intercept = int_ols, slope = slp_ols)

model_coefs_all <- bind_rows(model_coefs_all, lsdv_mdl_coef)  


# Plot
c2 <- model_coefs_all %>% 
  filter(source %in% c('OLS','TS','LSDV')) %>% 
  ggplot(aes(x = intercept, y = slope, colour = source)) +
  geom_point(alpha = 0.7) + ylim(-1.25, 5.5) +
  geom_text(aes(label = sector), nudge_x = 0.03, nudge_y = 0.05, check_overlap = TRUE, show.legend = FALSE, size = 3) +
  labs(
    title = 'Regression intercept and slope',
    subtitle = 'Estimated with independent OLS, Theil-Sen and LSDV',
    x = 'Intercept',
    y = 'Slope'
  ) +
  def_theme1 +
  scale_color_manual(values=c("OLS"="#3366FF", "TS"="grey40", "LSDV"="black"))

# Plot
c2


As discussed above, the LSDV model enforces a constant slope across groups and we can see this above.

Partial pooling

We now turn to multi-level models. Our expectation in fitting these types of models is that parameters will experience shrinkage to the mean as the population level data influences each groups coefficients. The references and introductory quotes inform this view.

Of interest is the extent to which the shrinkage aligns coefficients with the Theil-Sen robust model.

The mixed effects regression lines plotted below are estimated using the lme4 package.

# Mixed effects model with fixed and random intercept 
lme_1 <- lmer(log_pb ~ roe + (1 + roe | sector), data = data)

# Add predicted values for line in scatter plot
data$lme1_pred <- predict(lme_1)

p3 <- p2 + 
  geom_line(aes(x = roe, y = lme1_pred), data = data, size = 0.6, color = 'magenta2', linetype = 'longdash') + #darkorchid3
  labs(subtitle = 'Estimated with independent OLS (blue), independent Theil-Sen (grey), LSDV (black) \nand Multi level basis (dashed magenta)') +
  facet_wrap(~reorder(sector, as.numeric(sector)), ncol = 4, scales = 'free', labeller = sector_labels)

p3

And the the coefficients.

# Plot coefficients - add coefficients to existing plot
# CHANGE SOURCE TO MODEL
lme1_coefs = data.frame(
  sector = as.factor(rownames(ranef(lme_1)$sector)),
  intercept = unname(coef(lme_1)$sector['(Intercept)']),  # works only for single grouping variable
  slope = unname(coef(lme_1)$sector['roe']),              # works only for single grouping variable
  source = rep('Multi level', 11)
  )

model_coefs_all <- bind_rows(model_coefs_all, lme1_coefs)

model_coefs_all %>% 
  filter(source %in% c('OLS','TS','LSDV','Multi level')) %>% 
  ggplot(aes(x = intercept, y = slope, colour = source)) +
  geom_point(alpha = 0.7) + ylim(-1.25, 5.5) +
  geom_text(aes(label = sector), nudge_x = 0.03, nudge_y = 0.05, check_overlap = TRUE, show.legend = FALSE, size = 3) +
  labs(
    title = 'Regression intercept and slope',
    subtitle = 'Estimated with independent OLS, LSDV, Theil-Sen and Mixed effects',
    x = 'Intercept',
    y = 'Slope'
  ) +
  def_theme1 +
  scale_color_manual(values=c("OLS"="#3366FF", "TS"="grey40", "LSDV"="black", "Multi level"="magenta2"))

In light of expectations, the results above are a little underwhelming. There is essentially no difference between the mixed effects and individual OLS models. Why is this so? That question should be considered in light of the drivers of parameter shrinkage in multi-level models. Shrinkage is driven by:

  1. The size of the group, groups with more individual members will experience less shrinkage, and

  2. The variance of the random effects (how the groups differ) versus the residual variance (how the observations within each group differ). The more groups differ, versus the extent individuals within groups differ, the less shrinkage.1

Both of these points demonstrate that these models “let the data speak”. If groups differ substantially and have a lot of members, the parameters inferred from that group remain relatively unchanged. The vice versa is true.

Back to the question - why so little shrinkage? This is probably due to the small number of groups (11) and the large amount of data points within each group.

Sectors that experienced the most shrinkage (the difference between the independent OLS and multi-level models) are those that have fewer data points and / or the most extreme (furthest from average) parameter estimates pre shrinkage. Sector 9 (Business Services) falls into this category.

One final point in relation to this model. As specified, the model estimates a correlation between the intercepts and slopes for each sector. Is this correlation a reasonable assumption to make in terms of model structure? Lets explore that question.

What does the intercept, or more to the point, different intercepts across sectors represent? The intercept in a regression model is the outcome when the predictor is zero. For us, the intercept is therefore the premium or discount of market value over book value, when ROE is nil.

What about the slope coefficient on ROE? Wilcox [5] at p.199 defines the slope as representing the investment horizon before the ROE reverts to its mean 2.

So, should the P/B ratio when ROE is nil systematically change with the pre mean reversion investment horizon? I’m going to say no 3.

All the same, I will continue to model using the default correlation structure, simply because it is the default behaviour for lme4 and I doubt the effect is significant.


Bayesian approach

Next, we estimate the multi-level model using a Bayesian approach. A Bayesian approach will allow for the specification of priors over both the intercept and slope coefficient, along with the correlation between those parameters. An appropriately specified prior should enforce shrinkage (if the data allows) and provides the opportunity to encode our domain knowledge. Bayesian models also allow for uncertainty quantification, however this will not be looked at with this analysis.

Let’s think about the priors. We stated above, the intercept is defined as the outcome (P/B ratio) when the predictor (ROE) is zero. So what is a reasonable expectation of the P/B ratio when ROE is zero? Ignoring the log transform initially, if an asset does not earn a return then it should not command a premium. In this case the book and market values are expected to be identical. Therefore, when ROE is zero we should expect a price to book ratio of one (the log thereof being zero). Eyeballing the second plot above largely supports this theoretical narrative.

What about the slope coefficient on ROE? As stated, we expect that the slope of the regression co-efficient is positive. All else equal, a higher ROE leads to a higher valuation.

Reflective of this, the model below therefore has a N(0, 1.5) prior for the intercept and N(1, 1.5) prior for the slope.

The Bayesian mixed effects model below is fit with the Rethinking package.

# Data in Rethinking / McElreath format
d <- list(
  log_pb = data$log_pb,
  roe    = data$roe,
  vol    = scale(data$vol_ari_60d),
  sector = data$sector,
  n      = nrow(data)
  )

set.seed(123)

bayes1 <- ulam(
  alist(
    
    # Response distribution
    log_pb ~ normal(mu, sigma),

    # Linear model & residual SD
    mu <- a_sector[sector] + b_sector[sector] * roe,
    sigma ~ exponential(1),                     # prior for residual SD of response dist.
    
    # Variance covariance matrix for intercept and slope (note multi_normal = dmvnorm2)
    c(a_sector, b_sector)[sector] ~ multi_normal(c(a, b), Rho, sigma_sector),
    
    # Priors for covariance matrix
    a ~ normal(0, 1.5),                         # prior for average intercept
    b ~ normal(1, 1.5),                         # prior for average slope
    Rho ~ lkj_corr(2),                          # prior for correlation matrix
    sigma_sector ~ exponential(1)               # prior for vector of SD's (slope & int) in covariance matrix
  ),
  data = d,
  cores = 4
  )

# Coefficients
bayes1_coef_tbl <- coeftab(bayes1)
bayes1_coef <- data.frame(coef = row.names(bayes1_coef_tbl@coefs), value = bayes1_coef_tbl@coefs, row.names = NULL)
bayes1_coef$t <- substr(bayes1_coef$coef, 1, 1)
bayes1_coef$s <- as.integer(gsub("\\D", "", bayes1_coef$coef))

bayes1_coef1 <- data.frame(
  sector = as.factor(bayes1_coef[!is.na(bayes1_coef$s) & bayes1_coef$t == 'b', ]$s),
  intercept = bayes1_coef[!is.na(bayes1_coef$s) & bayes1_coef$t == 'a', ]$bayes1 + bayes1_coef[bayes1_coef$coef == 'a', ]$bayes1 * 0,
  slope = bayes1_coef[!is.na(bayes1_coef$s) & bayes1_coef$t == 'b', ]$bayes1 + bayes1_coef[bayes1_coef$coef == 'b', ]$bayes1 * 0, # included fixed effect / popn level param
  source = 'Bayes normal'
  )

model_coefs_all <- bind_rows(model_coefs_all, bayes1_coef1)  


# Predictions
bayes1_pred_samples <- link(bayes1, data = d)
data$bayes1_pred <- apply(bayes1_pred_samples, 2, mean)

Here is the plot of the regression.

# Plot
p4 <- p1 + 
  geom_line(aes(x = roe, y = bayes1_pred), data = data, size = 0.6, colour = 'magenta') +
  labs(subtitle = 'Estimated with independent OLS (blue), independent Theil-Sen (grey) \nand Bayesian mixed effects (magenta)') +
  def_theme1

p4


model_coefs_all %>% 
  filter(source %in% c('Bayes normal', 'Multi level', 'TS')) %>% 
  ggplot(aes(x = intercept, y = slope, colour = source)) +
  geom_point(alpha = 0.7) + ylim(-1.25, 5.5) +
  geom_text(aes(label = sector), nudge_x = 0.03, nudge_y = 0.05, check_overlap = TRUE, show.legend = FALSE, size = 3) +
  labs(
    title = 'Regression intercept and slope',
    subtitle = 'Estimated with independent Bayesian mulitlevel and Mixed effects',
    x = 'Intercept',
    y = 'Slope'
  ) +
  def_theme1 +
  scale_color_manual(values=c("Bayes normal"="black", "TS"="grey40", "Multi level"="magenta2"))


Bayesian approach - Student’s t likelihood

We now estimate the Bayesian model using tighter priors and a Student’s t likelihood. Models fitted thus far have employed a normal likelihood, the Student’s t distribution allows for fatter tails and hence for a more robust approach to dealing with outlying observations. Will this model configuration result in the un-intuitive negative slopes turning positive?

The prior over the slope in the model below is N(3, 1). The previous models prior is N(1, 1.5).

# Data in Rethinking / McElreath format
set.seed(123)
bayes2 <- ulam(
  alist(
    
    # Response distribution
    log_pb ~ dstudent(nu , mu, sigma),

    # Linear model
    mu <- a_sector[sector] + b_sector[sector] * roe,
    
    # Variance co-variance matrix prior for intercept and slope
    c(a_sector, b_sector)[sector] ~ multi_normal(c(a, b), Rho, sigma_sector),
    a ~ normal(1.25, 1),                         # prior for average intercept
    b ~ normal(3, 1),                            # prior for average slope
    nu ~ gamma(2, 0.1),                          # use brms default prior
    sigma_sector ~ exponential(1),
    sigma ~ exponential(1),                      # prior for residual standard deviation of response dist.
    Rho ~ lkj_corr(2)                            # prior for correlation matrix
  ),
  data = d,
  cores = 4
  )

# Coefficients
bayes2_coef_tbl <- coeftab(bayes2)
bayes2_coef <- data.frame(coef = row.names(bayes2_coef_tbl@coefs), value = bayes2_coef_tbl@coefs, row.names = NULL)
bayes2_coef$t <- substr(bayes2_coef$coef, 1, 1)
bayes2_coef$s <- as.integer(gsub("\\D", "", bayes2_coef$coef))

bayes2_coef1 <- data.frame(
  sector = as.factor(bayes2_coef[!is.na(bayes2_coef$s) & bayes2_coef$t == 'b', ]$s),
  intercept = bayes2_coef[!is.na(bayes2_coef$s) & bayes2_coef$t == 'a', ]$bayes2 + bayes2_coef[bayes2_coef$coef == 'a', ]$bayes2 * 0,
  slope = bayes2_coef[!is.na(bayes2_coef$s) & bayes2_coef$t == 'b', ]$bayes2 + bayes2_coef[bayes2_coef$coef == 'b', ]$bayes2 * 0, # included fixed effect / popn level param
  source = 'Bayes student'
  )

model_coefs_all <- bind_rows(model_coefs_all, bayes2_coef1)  


# Predictions
bayes2_pred_samples <- link(bayes2, data = d)
data$bayes2_pred <- apply(bayes2_pred_samples, 2, mean)



# Plot
p5 <- data %>% 
  ggplot(aes(x = roe, y = log_pb)) +
  facet_wrap(~reorder(sector, as.numeric(sector)), ncol = 4, scales = 'free', labeller = sector_labels) + 
  geom_point(alpha = 0.3) +
  geom_line(aes(x = roe, y = bayes1_pred), data = data, size = 0.6, colour = 'blue') +
  geom_line(aes(x = roe, y = bayes2_pred), data = data, size = 0.6, colour = 'green') +
  geom_segment(
    aes(x = x_min, xend = x_max, y = y_min, yend = y_max),
    alpha = 0.3,
    data = filter(model_coefs, date_stamp == as.Date('2021-06-30')),
    colour = 'grey40', size = 0.6
  ) +
  labs(
    title = 'Log book / price ratio versus return on equity by industry',
    subtitle = "Estimated with Bayes normal (blue), Bayes Student's t (green) and independent Theil-Sen (grey)",
    x = 'Return on equity',
    y = 'Log price / book ratio'
  ) +
  def_theme1

p5


model_coefs_all %>% 
  filter(source %in% c('Bayes normal','Bayes student', 'TS')) %>% 
  ggplot(aes(x = intercept, y = slope, colour = source)) +
  geom_point(alpha = 0.7) + ylim(-1.25, 5.5) +
  geom_text(aes(label = sector), nudge_x = 0.03, nudge_y = 0.05, check_overlap = TRUE, show.legend = FALSE, size = 3) +
  labs(
    title = 'Regression intercept and slope',
    subtitle = 'Estimated with independent TS and Bayesian multi level (student & normal)',
    x = 'Intercept',
    y = 'Slope'
  ) +
  def_theme1 +
  scale_color_manual(values=c("Bayes normal"="blue", "Bayes student"="green", "TS"="grey40", "Multi level"="magenta2"))

Using the Student’s t distribution does not result in drastically different slope estimates (except sector 7 - Healthcare). It is entirely possible the uncertainty around parameter estimates has been reduced (remember that is something we get with Bayesian estimation), however as stated, assessing parameter uncertainty is beyond the scope of this post.


Bayesian model with measurement error (Student’s t)

We now attempt to account for measurement error in the price to book ratio. It is well known that stock prices are more volatile than underlying business fundamentals. It is therefore reasonable to expect that any measurement error or random noise in this ratio scales with the volatility of the underlying stock price.

The model that follows is based on that per McElreath [6] p.493. This model considers the true underlying P/B ratio a function of the observed ratio and an error component that scales with trailing price volatility.

# Data in Rethinking / McElreath format
set.seed(123)
bayes3 <- ulam(
  alist(
    
    # Response distribution
    log_pb ~ dstudent(nu, log_pb_t, sigma),
    vector[n]:log_pb_t ~ dstudent(nu_t, mu, sigma),  # declare a vector of length n for each log_pb_t???  https://github.com/rmcelreath/rethinking/tree/Experimental#multilevel-model-formulas

    # Linear model
    mu <- a_sector[sector] + b_sector[sector] * roe,
    
    # Variance co-variance matrix prior for intercept and slope
    c(a_sector, b_sector)[sector] ~ multi_normal(c(a, b), Rho, sigma_sector),
    a ~ normal(1.25, 1),                         # prior for average intercept
    b ~ normal(1, 1.5),                          # prior for average slope
    nu ~ gamma(2, 0.1),                          # brms default prior
    nu_t ~ gamma(2, 0.1),                        # brms default prior
    sigma_sector ~ exponential(1),
    sigma ~ exponential(1),                      # prior for residual standard deviation of response dist.
    Rho ~ lkj_corr(2)                            # prior for correlation matrix
  ),
  data = d,
  cores = 4
  )

# Coefficients
bayes3_coef_tbl <- coeftab(bayes3)
bayes3_coef <- data.frame(coef = row.names(bayes3_coef_tbl@coefs), value = bayes3_coef_tbl@coefs, row.names = NULL)
bayes3_coef$t <- substr(bayes3_coef$coef, 1, 1)
bayes3_coef$s <- as.integer(gsub("\\D", "", bayes3_coef$coef))

bayes3_coef1 <- data.frame(
  sector = as.factor(bayes3_coef[!is.na(bayes3_coef$s) & bayes3_coef$t == 'b', ]$s),
  intercept = bayes3_coef[!is.na(bayes3_coef$s) & bayes3_coef$t == 'a', ]$bayes3 + bayes3_coef[bayes3_coef$coef == 'a', ]$bayes3 * 0,
  slope = bayes3_coef[!is.na(bayes3_coef$s) & bayes3_coef$t == 'b', ]$bayes3 + bayes3_coef[bayes3_coef$coef == 'b', ]$bayes3 * 0, # included fixed effect / popn level param
  source = 'Bayes student me'
  )

model_coefs_all <- bind_rows(model_coefs_all, bayes3_coef1)  


# Predictions
bayes3_pred_samples <- link(bayes3, data = d)
data$bayes3_pred <- apply(bayes3_pred_samples, 2, mean)
p6 <- p5 + 
  geom_line(aes(x = roe, y = bayes3_pred), data = data, size = 0.3, colour = 'red') +
  labs(
    title = 'Log book / price ratio versus return on equity by industry',
    subtitle = "Estimated with Bayes normal (blue), Bayes Student's t (green), Bayes Student's t with \nmeasurement error (red) and independent Theil-Sen (grey)",
    x = 'Return on equity',
    y = 'Log price / book ratio'
  )

p6

model_coefs_all %>% 
  filter(source %in% c('Bayes normal','Bayes student','Bayes student me', 'TS')) %>% 
  ggplot(aes(x = intercept, y = slope, colour = source)) +
  geom_point(alpha = 0.7) + ylim(-1.25, 5.5) +
  geom_text(aes(label = sector), nudge_x = 0.03, nudge_y = 0.05, check_overlap = TRUE, show.legend = FALSE, size = 3) +
  labs(
    title = 'Regression intercept and slope',
    subtitle = 'Estimated with various Bayesian multi level specifications and independent Theil-Sen',
    x = 'Intercept',
    y = 'Slope'
  ) +
  def_theme1 +
  scale_color_manual(values=c("Bayes normal"="blue", "Bayes student"="green", "Bayes student me"="red", "TS"="grey40"))

Once again, we do not see a large deviation in slope estimates across different techniques. It is difficult to pin down why this is so without performing a slew of additional analysis. I’m going to point out what I suspect is the primary issue, that being omitted variable bias. The driver of the outlying data points is a factor not modeled. The processes governing the valuation of any stock are extremely complex and noisy. We cannot hope to flexibly represent the valuation process as a model with only a single explanatory variable.

Multi-level GAM

Lastly we fit a multi-level Generalised Additive Model. The model below incorporates non-linearity, however it is constrained in that slopes (or more correctly splines) across sector must conform to a global shape. This model is of the “GS” type (global smoother with individual effects) specified in Pedersen et al. 2019 [7].

library(mgcv)

# Multi-level GAM 
gam_mlm <- gam(
  log_pb ~ s(roe, k = 5, m = 2) + s(roe, sector, k = 5, m = 2, bs = "fs"),
  data = data, 
  method = "REML"
  )

# Predict
gam_mlm_preds <- predict(gam_mlm, se.fit = TRUE)
data$gam_mlm_pred <- gam_mlm_preds$fit
data$gam_mlm_sepred <- gam_mlm_preds$se.fit

ggplot(data = data, aes(x = roe, y = log_pb, group = sector)) +
  facet_wrap(~reorder(sector, as.numeric(sector)), ncol = 4, scales = 'free', labeller = sector_labels) +
  geom_ribbon(aes(ymin = gam_mlm_pred - 2 * gam_mlm_sepred,
                  ymax = gam_mlm_pred + 2 * gam_mlm_sepred), alpha=0.25) +
  geom_line(aes(y = gam_mlm_pred)) +
  geom_point(alpha = 0.3) +
  labs(
    title = 'Log book / price ratio versus return on equity by industry',
    subtitle = 'Estimated with multi-level GAM',
    x = 'Return on equity',
    y = 'Log price / book ratio'
  ) +
  def_theme1


Summary

Thus far we have assessed model fit based on conformity with expectations and eyeballing of scatter plots and fitted regression lines. To summarise the various model configuration performance, lets look at each model from a predicted vs actual and predicted vs residual perspective. This may tease out further insights.

resid_data <- data %>% 
  mutate(
    lm_agg_e = lm_agg_pred  - log_pb,
    ts_agg_e  = ts_agg_pred  - log_pb,
    gam_agg_e = gam_agg_pred - log_pb,
    ols_ind_e = ols_ind_pred  - log_pb,
    ts_ind_e  = ts_agg_pred  - log_pb,
    lsvd_e    = lsvd_pred    - log_pb,
    lme1_e    = lme1_pred    - log_pb,
    gam_mlm_e  = gam_mlm_pred  - log_pb,
    bayes3_e  = bayes3_pred  - log_pb
  ) %>% 
  select(symbol,log_pb:bayes3_e) %>% 
  select(-int_ols, -int_ts, -slp_ols, -slp_ts, -bayes1_pred, -bayes2_pred, -gam_mlm_sepred) %>% 
  pivot_longer(!c(symbol,log_pb), names_to = 'model_type', values_to = 'value') %>% 
  mutate(data_type = if_else(grepl('_e', model_type), 'residual', 'fitted'))


# Re-order for facet
#resid_data$model_type <- factor(resid_data$model_type, levels = c("lm_agg_pred","ols_ind_pred","ts_agg_pred","gam_agg_pred","ts_ind_pred","lsvd_pred","lme1_pred","gam_mlm_pred","bayes3_pred"))


model_label <- as_labeller(c(
  lm_agg_pred = "Aggregate OLS",
  ols_ind_pred = "Independent OLS",
  ts_agg_pred = "Aggregate Theil-Sen",
  gam_agg_pred = "Aggregate GAM",
  ts_ind_pred = "Independent Theil-Sen",
  lsvd_pred = "Least Squares Dummy Variable",
  lme1_pred = "Frequentist Multi-Level",
  gam_mlm_pred = "Multi-Level GAM",
  bayes3_pred = "Bayesian Multi-Level"
  ))


ggplot(
  resid_data[resid_data$data_type == 'fitted', ], 
  aes(y = log_pb, x = value)
  ) + 
  geom_point(alpha = 0.1) +
  geom_abline(intercept = 0, slope = 1) +
  facet_wrap(vars(model_type), labeller = model_label) +
    labs(
    title = 'Log book / price ratio',
    subtitle = 'Actual vs fitted',
    y = 'Actual',
    x = 'Fitted'
  ) +
  theme(
    plot.caption = element_text(size = 8, margin = margin(t = 10), color = "grey40", hjust = 0),
    axis.text.x = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks = element_blank()
  ) +
  def_theme1

The multi-level GAM and independently fit Theil-Sen models appear to be the best fitting models.


resid_data2 <- resid_data %>% 
  mutate(
    model_type = sub("_pred", "", model_type),
    model_type = sub("_e", "", model_type)
    ) %>% 
  pivot_wider(id_cols = c(symbol, model_type), names_from = data_type, values_from = value)

# Re-order for facet
resid_data2$model_type <- factor(resid_data2$model_type, levels = c("lm_agg","ols_ind","ts_agg","gam_agg","ts_ind","lsvd","lme1","gam_mlm","bayes3"))

mae <- aggregate(resid_data2$residual, by = list(model_type = resid_data2$model_type), FUN = function(x) mean(abs(x)))

model_label2 <- as_labeller(c(
  lm_agg = "Aggregate OLS",
  ols_ind = "Independent OLS",
  ts_agg = "Aggregate Theil-Sen",
  gam_agg = "Aggregate GAM",
  ts_ind = "Independent Theil-Sen",
  lsvd = "Least Squares Dummy Variable",
  lme1 = "Frequentist Multi-Level",
  gam_mlm = "Multi-Level GAM",
  bayes3 = "Bayesian Multi-Level"
  ))

ggplot(
  resid_data2, 
  aes(y = residual, x = fitted)
  ) + 
  geom_point(alpha = 0.1) +
  geom_hline(yintercept = 0, alpha = 0.3) +
  facet_wrap(vars(model_type), labeller = model_label2) +
    labs(
    title = 'Log book / price ratio',
    subtitle = 'Residual vs fitted',
    y = 'Residual',
    x = 'Fitted'
  ) +
  geom_text(data = mae, size = 2.5, (aes(x = 4, y = -5, label = paste0("Mean absolute\nerror: ", round(x,2), sep = " "), colour = NULL, fill = NULL)), hjust = 0) +
  theme(
    plot.caption = element_text(size = 8, margin = margin(t = 10), color = "grey40", hjust = 0),
    axis.text.x = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks = element_blank()
  ) +
  def_theme1


A well fitted model will have its residuals plotting an equal variance about the zero line, doing so across all values of the predicted response. This is obviously not the case for the models above, there is quite a bit of clustering of errors. We can say the multi-level GAM and Bayesian multi-level models look best.


Closing

We set out to gain an intuition over various regression modeling techniques using real data, applied to a real problem. That has been achieved with Bayesian, frequentist and robust techniques having been applied in a stock valuation setting. This post was not designed to genuinely find a useful stock valuation model, that is of course highly complex and out of scope for this short note. It is rather a means to an end, that being to understanding how different models work and getting the hands dirty with real data.

As always, a bunch of modeling considerations have not been covered:

  1. As stated, modeling a complex phenomenon like stock valuations will be grossly under-specified with a single explanatory variable.

  2. Bayesian models provide parameter uncertainty quantification. This has not been analysed.

  3. The suitability of the multi-level structure has not been assessed. This could be performed with the intraclass-correlation coefficient (ICC).

  4. Formal model comparison and model fit diagnostics has not been performed.

I’ll close with the question as to whether deviations from modeled valuation (of a more sophisticated variety, taking account of say growth and risk) correlate with future returns? Something for the next round of analysis.

References

[1] Harrison XA, Donaldson L, Correa-Cano ME, Evans J, Fisher DN, Goodwin CED, Robinson BS, Hodgson DJ, Inger R. 2018. A brief introduction to mixed effects modelling and multi-model inference in ecology. PeerJ 6:e4794 https://doi.org/10.7717/peerj.4794


[2] Paul-Christian Bürkner. Advanced Bayesian Multilevel Modeling with the R Package brms. 2018


[3] Green, Brice and Thomas, Samuel, Inference and Prediction of Stock Returns using Multilevel Models (August 31, 2019). Available at SSRN: https://ssrn.com/abstract=3411358 or http://dx.doi.org/10.2139/ssrn.3411358


[4] Sommet, N. and Morselli, D., 2021. Keep Calm and Learn Multilevel Linear Modeling: A Three-Step Procedure Using SPSS, Stata, R, and Mplus. International Review of Social Psychology, 34(1), p.24. DOI: http://doi.org/10.5334/irsp.555


[5] Wilcox, J. 1999. Investing by the Numbers (Frank J. Fabozzi Series). Wiley


[6] McElreath, R. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and STAN, 2nd Edition. CRC Press


[7] Pedersen EJ, Miller DL, Simpson GL, Ross N. 2019. Hierarchical generalized additive models in ecology: an introduction with mgcv. PeerJ 7:e6876 https://doi.org/10.7717/peerj.6876


[8] Various websites referenced containing background materials


Topic Reference
Mixed effects models using lme4 https://lme4.r-forge.r-project.org/book/
lme4 manual https://www.chalmers.se/sv/institutioner/math/forskning/forskarutbildning/forskarutbildning-matematisk-statistik/forskarutbildningskurser-matematisk-statistik/Documents/bates_manual.pdf
lme4 vignette https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf
Doug Bates presentation http://pages.stat.wisc.edu/~bates/UseR2008/WorkshopD.pdf
Example to find lme4 predict coefficients https://stats.stackexchange.com/questions/174203/predict-function-for-lmer-mixed-effects-models/174227
Various syntax notes https://yury-zablotski.netlify.app/post/mixed-effects-models-2/
Syntax cheat sheet https://stats.stackexchange.com/questions/13166/rs-lmer-cheat-sheet
Sweep function https://stackoverflow.com/questions/3444889/how-to-use-the-sweep-function
Plot reference https://www.tjmahr.com/plotting-partial-pooling-in-mixed-effects-models/
Model diagnostics https://www.ssc.wisc.edu/sscc/pubs/MM/MM_DiagInfer.html
Shrinkage & correlation http://doingbayesiandataanalysis.blogspot.com/2019/07/shrinkage-in-hierarchical-models-random.html
Partial pooling https://bayesiancomputationbook.com/markdown/chp_04.html#mixing-group-and-common-parameters
Interesting article https://www.cbssports.com/mlb/news/mlb-analytics-guru-who-could-be-the-next-nate-silver-has-a-revolutionary-new-stat/
Interesting article https://www.baseballprospectus.com/news/article/26195/prospectus-feature-introducing-deserved-run-average-draand-all-its-friends/
Interesting article https://www.baseballprospectus.com/news/article/26196/prospectus-feature-dra-an-in-depth-discussion/
Robust Bayesian regression https://solomonkurz.netlify.app/post/2019-02-02-robust-linear-regression-with-student-s-t-distribution/
Intraclass Correlation Coefficient https://www.theanalysisfactor.com/the-intraclass-correlation-coefficient-in-mixed-models/
Shrinkage https://m-clark.github.io/posts/2019-05-14-shrinkage-in-mixed-models/
Is MLM necessary? https://mldscenter.maryland.gov/egov/Publications/ResearchSeries/Clustered%20Data,%20Are%20Multilevel%20Models%20Really%20Necessary.pdf

  1. This can be measured with the intraclass-correlation coefficient (ICC) / variance partition coefficient (VPC). A high ICC indicates that observations depend on cluster membership, and hence will experience less shrinkage. A low ICC / VPC can indicate that a multi-level modelling structure is not warranted. Also note the Design Effect discussed by Sommet & Morselli [4] that is designed to perform the same task.↩︎

  2. The intuition behind this is beyond me, and the mathematics supporting the PB-ROE model are pretty hairy.↩︎

  3. Although I can entertain the idea that P/B ratio is driven by growth in ROE across sector, and this may relate to investment horizon.↩︎