Jeromy Anglim's Notes

Assorted notes on statistics, R, psychological research, LaTeX, computing, etc. See also my primary blog for more substantive posts: jeromyanglim.blogspot.com
  • rss
  • archive

Tags:
  • R
  • LaTeX
  • Linux / Ubuntu
  • OSX
  • Jags
  • Tumblr

Previous Notes:
Main Blog:
  • Jags: converting multilevel model from wide to long format

    Multilevel models can be specified in both wide and long format. Wide format is possible when the data is balanced (i.e., each group has an equal number number of observations). When the data is not balanced, then long format is required.

    • How can a BUGS model specified in wide format be converted into one specified in long format?

    Here’s a random intercept JAGS model specified using wide format data

    for (i in 1:N) {
        for (j in 1:J) {
            mu[i,j] <- alpha[i] + beta * (X[i,j] - x.bar);
            Y[i,j]   ~ dnorm(mu[i, j], tau.c)
        }
        alpha[i] ~ dnorm(alpha.mu, alpha.tau);
    }
    

    With data formatted in R as follows

    jagsdata <- list(X=as.matrix(Data.wide.x), Y=as.matrix(Data.wide.y), N=N, J=J)
    

    And heres the JAGS model in long format

    for (i in 1:N) {
        mu[i] <- alpha[id.i[i]] + beta * (X[i] - x.bar);
        Y[i]   ~ dnorm(mu[i], tau.c)
    }
    
    for (i in 1:I) {    
        alpha[i] ~ dnorm(alpha.mu, alpha.tau);
    }
    

    with the following R data line

    jagsdata <- list(X=Data$x, Y=Data$y, id.i=Data$id.i, N=nrow(Data), 
                     I=length(unique(Data$id.i)))
    

    Note that specification of priors and so forth were basically unchanged.

    So, what’s changed?

    • Wide format involves use of matrix notation (e.g., mu[i,j]) long format uses vector notation (e.g., mu[i])
    • Wide format has two nested for loops. The outer-loop loops over rows (i.e., group ids); the inner-loop loops over columns (i.e., observations within each group). The random intercept coefficient is placed outside the inner-loop because it does not vary by group.
    • Long format has two separate loops. The first loops over every observation (i.e., every observation in every group). A variable is included (e.g., id.i) which is of the length of y but records the group id. This indicates which intercept coefficient in this case to use for the particular observation. The second loop is of length equal to the number of groups.
    • December 7, 2012 (10:17 am)
    • #@jags
  • comments powered by Disqus