| matchindex {truecluster} | R Documentation |
Permutes rows and columns of a cross-table to match the marginal classes.
matchindex(observed, method = c("heuristic", "truematch", "tracemax")[1])
matchtable(observed, method = c("heuristic", "truematch", "tracemax")[1])
matchcalcs(k)
observed |
an integer matrix representing a cross-table, non-square allowed |
method |
one of c("heuristic","truematch","tracemax"), see details |
k |
number of categories to match |
Method heuristic – the default – matches using the truematch heuristic, i.e. recursively calculating resisuals and removing the row/col with the biggest residuals. The number of residuals calculated during matching is given by function matchcalcs = 1/6*(n*(n+1)*(2*n+1)) - 1 (Otto, Forster, Analysis 1, Diffential und Integralrechnung einer Veränderlichen, Viehweg, Braunschweig (1992).). Thus the truematch heuristic has time-complexity O(n^3), and memory-complexity O(n^2), if replacing recursion by while loops as done here.
Method truematch – the truematch algorithm – calculates residuals just once and then applies trace maximization (Munkres Hungarian Method).
Method tracemax applies trace maximization directly without calculating residuals.
All methods can cope with (non-square) rectangular matrices. All methods break ties by random. All methods use C implementations. All methods have polynomial time complexity.
Function matchindex returns a list with components
row |
permutation index for the rows |
col |
permutation index for the columns |
row |
permutation index for the rows |
col |
permutation index for the columns |
Function matchcalcs returns the number of residuals to calculate during matching
Jens Oehlschlägel
xx
munkres, for another implementation of the Hungarian method see solve_LSAP, for greedy heuristics see matchClasses
m <- rbind(c(1, 98), c(0, 1))
i <- matchindex(m)
i
m[i$row, i$col]
i <- matchindex(m, method="truematch")
i
m[i$row, i$col]
i <- matchindex(m, method="tracemax")
i
m[i$row, i$col]
## Not run:
library(e1071)
library(clue)
#library(lpSolve)
cat("Check quality with respect to trace maximization\n")
x <- integer(6)
names(x) <- c("e","s","l","t","r","m")
for (i in 1:400){
k <- 5
m <- table(sample(1:k, 10000, TRUE), sample(1:k, 10000, TRUE))
i.e <- matchClasses(m, method="exact")
m.e <- m[,i.e]
i.s <- solve_LSAP(m, maximum=TRUE)
m.s <- m[,i.s]
i.l <- apply(lp.assign(-m)$solution, 1, which.max)
m.l <- m[,i.l]
i.t <- matchindex(m, method="tracemax")
m.t <- m[i.t$row, i.t$col]
i.r <- matchindex(m, method="truematch")
m.r <- m[i.r$row, i.r$col]
i.m <- matchindex(m, method="truecluster")
m.m <- m[i.m$row, i.m$col]
x["e"] <- x["e"] + sum(diag(m.e))
x["s"] <- x["s"] + sum(diag(m.s))
x["l"] <- x["l"] + sum(diag(m.l))
x["t"] <- x["t"] + sum(diag(m.t))
x["r"] <- x["r"] + sum(diag(m.r))
x["m"] <- x["m"] + sum(diag(m.m))
}
x / max(x)
cat("Check speed\n")
K <- 2^(2:9)
n <- length(K)
tim <- matrix(NA, nrow=n, ncol=3, dimnames=list(NULL, c("s","m","t")))
for (i in 1:n){
k <- K[i]
m <- table(sample(1:k, 1000000, TRUE), sample(1:k, 1000000, TRUE))
tim[i,"s"] <- system.time({solve_LSAP(m)})[3]
tim[i,"m"] <- system.time({matchindex(m, method="truecluster")})[3]
tim[i,"t"] <- system.time({matchindex(m, method="tracemax")})[3]
matplot(K, tim, pch=c("s","m","t"))
}
## End(Not run)