| extract_plot {fabia} | R Documentation |
extract_plot: R implementation of extract_plot.
extract_plot(X,L,Z,thresZ=0.5,ti,thresL=NULL,Y=NULL,x11b=TRUE,norm=1)
X |
original data matrix. |
L |
loading, left matrix. |
Z |
factor, right matrix. |
thresZ |
threshold for sample belonging to bicluster (default 0.5). |
thresL |
threshold for loading belonging to bicluster (estimated if not given). |
ti |
plot title. |
Y |
noise free data matrix. |
x11b |
plot on screen. |
norm |
should the data be standardized, default = 1 (yes, using mean), 2 (yes, using median). |
Essentially the model is the sum of outer products of vectors. The number of summands p is the number of biclusters.
X = L Z + U
X = sum_{i=1}^{p} L_i (Z_i )^T + U
The hidden dimension p is used for kmeans clustering of L_i and Z_i .
The L_i and Z_i are used to extract the bicluster i, where a threshold determines which observations and which samples belong the the bicluster.
The method produces a couple of plots given below.
Plots:
“Y”: noise free data (if available),
“X”: data,
“LZ”: reconstructed data,
“LZ-X”: error,
“abs(Z)”: absolute factors,
“abs(L)”: absolute loadings,
“abs(nL)”: absolute loadings normalized,
“abs(nZ)”: absolute factors normalized,
“nZ*pmZ”: factors sorted,
“pmL*nL”: loadings sorted,
“pmL*L*z*pmZ”: reconstructed matrix sorted,
“pmL*X*pmZ”: original matrix sorted.
In above plots the matrix L and the matrix
Z are sorted. For sorting first kmeans is
on the p dimensional space is performed and then the vectors
which belong to the same cluster are put together in the sorting.
This sorting is made for visualization but in general
it is not possible to
visualize all biclusters as blocks if they overlap.
In bic the biclusters are extracted according to the
largest absolute values of the component i, i.e.
the largest values of L_i and the
largest values of Z_i . The factors Z_i
are normalized to variance 1.
The components of bic are
bin, bixv,
bixn, biypv, biypn, biynv,
and biynn.
bin gives the size of the bicluster: number observations,
number positive samples, number negative samples.
bixv gives the values of the observations that have absolute
values above a threshold. They are sorted and
bixn gives their names (e.g. gene names).
biypv gives the values of the samples that have
values above a threshold. They are sorted and
biypn gives their names (e.g. sample names).
biynv gives the values of the samples that have
values below this threshold. They are sorted and
biynn gives their names (e.g. sample names).
That means the samples are divided into two groups where one group shows large positive values and the other group has negative values with large absolute values. That means a observation pattern can be switched on or switched off relative to the average value.
numn gives the indexes of bic with components:
numn1 = bix ,numn2 = biyp, and
numn3 = biyn.
The kmeans clusters are given by biclust with
components biclustx (the clustered observations)
and biclusty (the clustered samples).
Implementation in R.
bic |
extracted biclusters. |
numn |
indexes for the extracted biclusters. |
biclust |
clusters of kmeans clustering. |
pmZ |
permutation matrix of z from kmeans clustering. |
pmL |
permutation matrix of Lambda from kmeans clustering. |
nL |
normalized loadings (left matrix). |
nZ |
normalized factors (right matrix). |
Xord |
sorted original matrix according to kmeans on Z and kmeans on Lambda. |
Sepp Hochreiter
fabi,
fabia,
fabiap,
fabias,
fabiasp,
mfsc,
nmfdiv,
nmfeu,
nmfsc,
nprojfunc,
projfunc,
make_fabi_data,
make_fabi_data_blocks,
make_fabi_data_pos,
make_fabi_data_blocks_pos,
extract_bic,
myImagePlot,
PlotBicluster,
Breast_A,
DLBCL_B,
Multi_A,
fabiaDemo,
fabiaVersion
#--------------- # TEST #--------------- dat <- make_fabi_data_blocks(n = 100,l= 50,p = 3,f1 = 5,f2 = 5, of1 = 5,of2 = 10,sd_noise = 3.0,sd_z_noise = 0.2,mean_z = 2.0, sd_z = 1.0,sd_l_noise = 0.2,mean_l = 3.0,sd_l = 1.0) X <- dat[[1]] Y <- dat[[2]] X <- X- rowMeans(X) XX <- (1/ncol(X))*tcrossprod(X) dXX <- 1/sqrt(diag(XX)+0.001*as.vector(rep(1,nrow(X)))) X <- dXX*X resEx <- fabia(X,20,0.3,1.0,1.0,3) rEx <- extract_plot(X,resEx$L,resEx$Z,ti="FABIA",Y=Y,x11b=FALSE) rEx$bic[1,] rEx$bic[2,] rEx$bic[3,] rEx$biclust[1,] rEx$biclust[2,] rEx$biclust[3,] ## Not run: #--------------- # DEMO1 #--------------- dat <- make_fabi_data_blocks(n = 1000,l= 100,p = 10,f1 = 5,f2 = 5, of1 = 5,of2 = 10,sd_noise = 3.0,sd_z_noise = 0.2,mean_z = 2.0, sd_z = 1.0,sd_l_noise = 0.2,mean_l = 3.0,sd_l = 1.0) X <- dat[[1]] Y <- dat[[2]] resToy <- fabia(X,200,0.4,1.0,1.0,13) rToy <- extract_plot(X,resToy$L,resToy$Z,ti="FABIA",Y=Y) #--------------- # DEMO2 #--------------- data(Breast_A) X <- as.matrix(XBreast) resBreast <- fabia(X,200,0.1,1.0,1.0,5) rBreast <- extract_plot(X,resBreast$L,resBreast$Z,ti="FABIA Breast cancer(Veer)") #sorting of predefined labels CBreast ## End(Not run)