Modeling Latent Effect Heterogeneity

The goal of this project is to provide a series of tools to investigate latent heterogeneity in the effect of treatment variables or other observed covariates. Latent heterogeneity can occur because latent conditioning terms (i.e., interactive factors) are omitted in the empirical analysis.

In generalized linear models, omitting interactions can lead to latent occurrences of Simpson’s Paradox, which is a long-standing problem in statistical analysis in general and in the social sciences in particular. Simpson’s paradox refers to the possibility that an effect found when data are aggregated is entirely different or even reversed when data are separated and analyzed in groups. There are modeling solutions when the groups are known and their causal role is known (Pearl, 2011). But if the groups are latent, classical empirical approaches (GLM, mixed models, etc.) are not able to detect and deal with them, meaning that latent heterogeneity or Simpson’s Paradox occurrence can go unnoticed by the researcher. In practice, it means that a researcher can conclude that an effect is positive when, in fact, it is positive only for a subgroup of the population but negative for other subgroups.

In comparative analysis, there is another level of complication: different countries can have different latent factors conditioning different observed covariates. This problem is not new, and many researchers have recognized its importance and its implications for both observational and experimental studies (see, for instance, Adam Przeworski, 2007). It is impossible for researchers to know a priori if interactions or group heterogeneity are omitted. Suzan Stokes (2014), using different terms, argues that the omnipresent possibility of omitting relevant interactions in the analysis is a source of an attitude of radical skepticism regarding the results of observational and experimental empirical investigation in the social sciences. She says:

But from the standpoint of the radical skeptic, no research design can dispose of all potential interactions. Setting plausibility aside, if units have high dimensionality and if some confounders are unmeasurable, some unobserved trait is always likely to interact with the treatment. Faced with an experimental study that uncovers a causal effect, the radical skeptic should posit some unspecified subset of units whose response to treatment is at odds with the average response, potentially changing the theoretical implications of the study’s findings. If interactions can change the interpretation of experimental results, then the radical skeptic should be unnerved by their implication for experimental research. Because one can test only for interactions between treatments and observed factors, ungrounded skepticism implies that we will remain in the dark regarding the real findings of experimental studies.

Unobserved interactions [...] are omnipresent and inevitably limit the contribution of research to knowledge (Stokes, 2014, pg. 46)

This project develops machine learning approaches and semi-parametric Bayesian (SPB) methods for dealing with those issues and investigating if interactions were omitted.

In the paper “Modeling Context-Dependent Latent Effect Heterogeneity,” published in Political Analysis, I propose a hierarchical Dirichlet mixture of generalized linear models to deal with that problem. Using the model, researchers don’t need to specify all interactions explicitly. The model estimates marginal effects, even though interactions are missing in the model specification. Moreover, contrary to previous approaches, the method allows researchers to investigate whether contextual features such as schools, hospitals, neighborhoods, and country-specific institutional settings are associated with the emergence of latent heterogeneity in the effect of observables. I illustrate the model’s contributions with applications in political science that investigate attitudes toward financial aid and the effect of inequality on beliefs about meritocracy. The method is implemented in R, and it is available in the R package Hierarchical Dirichlet Process Generalized Linear Models (hdpGLM).

I have used SPB models to study the latent structure of public support for welfare policies in OECD countries. I show that there is a hidden polarization among the observed socioeconomic groups in some countries but not others. My research indicates that one side effect of welfare policies in highly unequal societies with fragmented party systems is the existence of latent polarization in welfare policy preferences among individuals with similar observed socioeconomic characteristics. Countries that have comparatively smaller welfare states (the USA, Japan, Australia, New Zealand) do not display such a latent polarization.

Publications:

References

Avatar
Diogo Ferrari
Political Scientist

Related