matchindex {truecluster}R Documentation

Matching for cross-tables

Description

Permutes rows and columns of a cross-table to match the marginal classes.

Usage

matchindex(observed, method = c("heuristic", "truematch", "tracemax")[1])
matchtable(observed, method = c("heuristic", "truematch", "tracemax")[1])
matchcalcs(k)

Arguments

observed an integer matrix representing a cross-table, non-square allowed
method one of c("heuristic","truematch","tracemax"), see details
k number of categories to match

Details

Method heuristic – the default – matches using the truematch heuristic, i.e. recursively calculating resisuals and removing the row/col with the biggest residuals. The number of residuals calculated during matching is given by function matchcalcs = 1/6*(n*(n+1)*(2*n+1)) - 1 (Otto, Forster, Analysis 1, Diffential und Integralrechnung einer Veränderlichen, Viehweg, Braunschweig (1992).). Thus the truematch heuristic has time-complexity O(n^3), and memory-complexity O(n^2), if replacing recursion by while loops as done here. Method truematch – the truematch algorithm – calculates residuals just once and then applies trace maximization (Munkres Hungarian Method). Method tracemax applies trace maximization directly without calculating residuals. All methods can cope with (non-square) rectangular matrices. All methods break ties by random. All methods use C implementations. All methods have polynomial time complexity.

Value

Function matchindex returns a list with components

row permutation index for the rows
col permutation index for the columns
row permutation index for the rows
col permutation index for the columns

Function matchcalcs returns the number of residuals to calculate during matching

Author(s)

Jens Oehlschlägel

References

xx

See Also

munkres, for another implementation of the Hungarian method see solve_LSAP, for greedy heuristics see matchClasses

Examples

m <- rbind(c(1, 98), c(0, 1))
i <- matchindex(m)
i
m[i$row, i$col]
i <- matchindex(m, method="truematch")
i
m[i$row, i$col]
i <- matchindex(m, method="tracemax")
i
m[i$row, i$col]
  ## Not run: 
     library(e1071)
     library(clue)
     #library(lpSolve)

     cat("Check quality with respect to trace maximization\n")
     x <- integer(6)
     names(x) <- c("e","s","l","t","r","m")
     for (i in 1:400){
        k <- 5
        m <- table(sample(1:k, 10000, TRUE), sample(1:k, 10000, TRUE))
        i.e <- matchClasses(m, method="exact")
        m.e <- m[,i.e]
        i.s <- solve_LSAP(m, maximum=TRUE)
        m.s <- m[,i.s]
        i.l <- apply(lp.assign(-m)$solution, 1, which.max)
        m.l <- m[,i.l]
        i.t <- matchindex(m, method="tracemax")
        m.t <- m[i.t$row, i.t$col]
        i.r <- matchindex(m, method="truematch")
        m.r <- m[i.r$row, i.r$col]
        i.m <- matchindex(m, method="truecluster")
        m.m <- m[i.m$row, i.m$col]
        x["e"] <- x["e"] + sum(diag(m.e))
        x["s"] <- x["s"] + sum(diag(m.s))
        x["l"] <- x["l"] + sum(diag(m.l))
        x["t"] <- x["t"] + sum(diag(m.t))
        x["r"] <- x["r"] + sum(diag(m.r))
        x["m"] <- x["m"] + sum(diag(m.m))
     }
     x / max(x)

     cat("Check speed\n")
     K <- 2^(2:9)
     n <- length(K)
     tim <- matrix(NA, nrow=n, ncol=3, dimnames=list(NULL, c("s","m","t")))
     for (i in 1:n){
        k <- K[i]
        m <- table(sample(1:k, 1000000, TRUE), sample(1:k, 1000000, TRUE))
        tim[i,"s"] <- system.time({solve_LSAP(m)})[3]
        tim[i,"m"] <- system.time({matchindex(m, method="truecluster")})[3]
        tim[i,"t"] <- system.time({matchindex(m, method="tracemax")})[3]
        matplot(K, tim, pch=c("s","m","t"))
     }
  ## End(Not run)

[Package truecluster version 0.3 Index]