Summary for hdpGLM class — summary.hdpGLM • hdpGLM

This is a generic summary function that describes the output of the function hdpGLM

# S3 method for hdpGLM
summary(object, ...)

Arguments

object

an object of the class hdpGLM generted by the function hdpGLM

...

Additional arguments accepted are:

true.beta: a data.frame with the true values of the linear coefficients beta if they are known. The data.frame must contain a column named j with the index of the context associated with that particular linear coefficient beta. It must match the indexes used in the data set for each context. Another column named k must be provided, indicating the cluster of beta, and a column named Parameter with the name of the linear coefficients (beta1, beta2, ..., beta_dx, where dx is the number of covariates at the individual level, and beta1 is the coefficient of the intercept term). It must contain a column named True with the true value of the betas. Finally, the data.frame must contain columns with the context-level covariates as used in the estimation of the hdpGLM function (see Details below).

true.tau: a data.frame with four columns. The first must be named w and it indicates the index of each context-level covariate, starting with 0 for the intercept term. The second column named beta must contain the indexes of the betas of individual-level covariates, starting with 0 for the intercept term. The third column named Parameter must be named tau<w><beta>, where w and beta must be the actual values displayed in the columns w and beta. Finally, it must have a column named True with the true value of the parameter.

Value

The function returns a list with two data.frames. The first summarizes the posterior distribution of the linear coefficients beta. The mean, median, and the 95% HPD interval are provided. The second data.frame contains the summary of the posterior distribution of the parameter tau.

Details

The function hdpGLM returns a list with the samples from the posterior distribution along with other elements. That list contains an element named context.cov that connects the indexed "C" created during the estimation and the context-level covariates. So each unique context-level covariate gets an index during the estimation. The algorithm only requires the context-level covariates, but it creates such index C to help the estimation. If true.beta is provided, it must contain indexes for the context as well, which indicates the context of each specific linear coefficient beta. Such index will probably be different from the one created by the algorithm. Therefore, when the true.beta is provided, we need to connect the context index C generated by the algorithm and the column j in the true.beta data.frame in order to compare the true values and the estimated value for each context. That is why we need the values of the context-level covariates as well. The summary uses them as key to merge the true and the estimated values for each context. The true and estimated clusters are matched based on the shortest distance between the estimated posterior average and the true value in each context because the labels of the clusters in the estimation can vary, even thought the same data points are classified in the same clusters.