|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--HierClustNode
The HierClustNode class computes a general purpose hierarchical clustering tree. It recursively generates the cluster tree data structures for nObj objects (e.g. genes or samples) based on nDataV data vectors/object (i.e. expression vectors). If data is to be normalized, then it is normalized by the dataV[iDataV] entry.
It recursively constructs a cluster tree using the selected linkage method. The terminal nodes contain the genes.
which sets up various lists dGramXXXX[] that
are used by @see DrawClusterGram for drawing the clustergram and dendrogram.
This work was produced by Peter Lemkin of the National Cancer
Institute, an agency of the United States Government. As a work of
the United States Government there is no associated copyright. It is
offered as open source software under the Mozilla Public License
(version 1.1) subject to the limitations noted in the accompanying
LEGAL file. This notice must be included with the code. The MAExplorer
Mozilla and Legal files are available on http://maexplorer.sourceforge.net/.
,
MAExplorer Home,
ClusterGenes
,
ClusterGramCanvas
,
ClusterSamples
,
DrawClusterGram
Field Summary | |
(package private) static float[] |
cacheDistLR1Df
lower-diagonal cache of distLR[L][R]. |
(package private) static short[] |
cacheDistLR1Ds
lower-diagonal cache of distLR[L][R]. |
(package private) static int |
cacheSize
size of cacheDistLR1D[] cache |
(package private) int |
cIdx
Dynamic node: object cluster index if not -1 |
(package private) float[] |
dataV
Dynamic node: mean vector [0:nDataV-1] of left and right |
private float[] |
dataV1
Dynamic node: [0:nDataV-1] copy of dataV1[] |
private float[] |
dataV2
Dynamic node: [0:nDataV-1] copy of dataV1[] |
(package private) static float[] |
dGramDist
[0:nObj-2] distance between children |
(package private) static float[] |
dGramDistL
[0:nObj-2] distance between Left's children |
(package private) static float[] |
dGramDistR
[0:nObj-2] distance between Right's children |
(package private) static float[] |
dGramEnumOrder
[0:nObj-2] enumeration order |
(package private) static float[] |
dGramEOleft
[0:nObj-2] left child enum order |
(package private) static float[] |
dGramEOright
[0:nObj-2] right child enum order |
(package private) static int[] |
dGramLeftNNbr
[0:nObj-2] Left Node Number |
(package private) static int[] |
dGramNodeNbr
[0:nObj-2] Node Number |
(package private) static int[] |
dGramRightNNbr
[0:nObj-2] Right Node Number |
private float |
dist
Dynamic node: local variables allocated once for speedup |
(package private) float |
distLR
Dynamic node: nDistV*(distance**2) between right & left nodes |
(package private) static int |
enumCnt
# enumerated so far in an enumeration |
(package private) float |
enumOrder
Dynamic node: set to enumeration order in range [0:nObj-1]. |
private static FileIO |
fio
link to global FileIO instance |
(package private) HierClustNode |
hcLeft
Dynamic node: if not null, then left cluster |
(package private) HierClustNode |
hcParent
Dynamic node: if not null, then the parent of this node |
(package private) HierClustNode |
hcRight
Dynamic node: if not null, then right cluster |
(package private) static HierClustNode[] |
hierClusters
hierarchical clusters [0:2*nObj-1] nHierClusters is the current one. |
(package private) static int |
hierClustMode
linkage mode |
(package private) static int |
iDataV
col to norm by in [0:nDataV-1] |
private static MAExplorer |
mae
link to global MAExplorer instance |
(package private) static float |
maxDgramDist
max dGramDistXXX[] (distance not dist**2) |
(package private) static float |
maxDistLR
max distLR (dist**2) found between all nodes |
(package private) boolean |
memAllocFailedFlag
|
private static float[] |
mnDataV
Dynamic node: [0:nDataV-1] Global mean of data for samples h |
private static float[] |
mnDataVSq
Dynamic node: [0:nDataV-1] global mean Sq data for samples h |
(package private) static float[] |
msMaxDataS
cache mae.hps.msListE[].maQ.maxRI, for normalizng data |
(package private) java.lang.String |
name
Dynamic node: name of node if it is not null |
(package private) int |
nbrChildren
Dynamic node: # of terminal children of this node. |
(package private) static int |
nDataV
size of dataV[] vector |
(package private) static HierClustNode |
nextNode
next node in traversal |
(package private) static int |
nHierClusters
# of nodes clustered 2*nObj-1 |
(package private) static int |
nObj
# of objects to cluster |
(package private) int |
nodeID
Dynamic node: unique cluster node # |
(package private) static int |
nonTermEnumCnt
# non-terminal nodes enumerated in order |
(package private) static boolean |
normByHPXflag
flag: normalize data for gene in HP-X else HP[RatioHP] |
(package private) static boolean |
normHCbyRatioHPflag
flag: set to true if norm dataV[h] by dataV[iDataV] else norm by HP[h] msListE[h].maQ.maxRI |
(package private) static int[] |
objCidx
sorted cluster cIdx data [0:nObj-1] |
(package private) static float[][] |
objDataV
sorted norm. |
(package private) int |
oFeature
optional feature if not equal to -1 |
private static float[] |
sdDataV
Dynamic node: [0:nDataV-1] global StdDev data for samples h |
(package private) static boolean |
stdizeByMnStdFlag
flag: standardize all array dataV[h] by mnDataV[h]/sdDataV[h] |
private float |
sumCBdist
Dynamic node: sum of CB distances |
private float |
sumDistSq
Dynamic node: sum of distances squared |
(package private) static int |
termEnumCnt
# terminal nodes enumerated in order |
(package private) static HierClustNode |
topNode
top node of the tree |
(package private) static int |
topNodeID
top Node of tree |
(package private) static boolean |
unWeightedAvgFlag
flag: unweighted/weighted avg calc method |
(package private) static boolean |
useClusterDistCacheFlag
flag: cache distance matrix. |
(package private) static boolean |
useCorrCoeffFlag
flag: use corr. |
(package private) static boolean |
useShortCacheFlag
flag: use short else float cache |
(package private) boolean |
visited
Dynamic node: set to true if visited in tree traversal |
Constructor Summary | |
(package private) |
HierClustNode(HierClustNode hcRight,
HierClustNode hcLeft)
HierClustNode() - construct top level hierarchical cluster node |
(package private) |
HierClustNode(int cIdx,
float[] dataVorig,
java.lang.String name,
int oFeature)
HierClustNode() - construct bottom level hierarchical cluster nodes which will provide data for both a Clustergram and Dendrogram... |
(package private) |
HierClustNode(MAExplorer maE,
int nObj,
int nDataV,
GeneList ml,
float[][] geneEPvector)
HierClustNode() - setup static variables for ONLY ClusterGram for use with K-means clustering data. |
(package private) |
HierClustNode(MAExplorer maE,
int nObj,
int nDataV,
int hierClustMode,
boolean useCorrCoeffFlag,
boolean useClusterDistCacheFlag,
boolean normHCbyRatioHPflag,
boolean normByHPXflag,
int iDataV)
HierClustNode() - setup static variables for full hier clustering which will provide data for both a Clustergram and Dendrogram... |
Method Summary | |
private int |
addr1D(int x,
int y)
addr1D() - lookup lower-diagonal addr1D(x,y) [y' + x'*(x'+1)/2] where: x' = (y |
private static boolean |
buildClusterGramLists()
buildClusterGramLists() - construct ordered ClusterGram lists |
private static boolean |
buildDendrogramLists()
buildDendrogramLists() - construct ordered ClusterGram and dendrogram lists. |
(package private) boolean |
calcHierCluster(float[][] objectsVectors,
java.lang.String[] objNames,
int[] objFeatureList,
int hierClusterMode)
calcHierCluster() - compute the hierarchical clusters tree. |
private float[] |
calcMeanDataVect(HierClustNode c1,
HierClustNode c2)
calcMeanDataVect() - compute mean intensity vector of 2 nodes. |
private void |
calcStandardizeMnStdDev()
calcStandardizeMnStdDev() - calc global mnDataV[] and sdDataV[] over [0:nDataV-1] for all samples. |
(package private) static void |
cleanup()
cleanup() - close up what needs to be close and GC all structures. |
(package private) static HierClustNode |
enumerateLeafNodes()
enumerateLeafNodes() - enumerate Leaf nodes (Right then left) of tree. |
(package private) static HierClustNode |
enumerateNodes()
enumerateNodes() - enumerate the nodes (Right then Left) of tree. |
(package private) HierClustNode |
findClosestNode(HierClustNode hcI,
int i)
findClosestNode() - find closest unvisited node hcJ to node hcI such that the distance between hcI and HcJ is a minmum. |
(package private) int[] |
findSubtreeOfNodeNbrs(int topNodeNbrOfSubtree)
findSubtreeOfNodeNbrs() - return an int[] list of node ID numbers for nodes in the subtree defined by topNodeNbrOfSubtree. |
private float |
getEnumOrder()
getEnumOrder() - lookup or RECURSIVELY compute enumOrder of this node. |
private float |
nodeNodeDistSQ(HierClustNode c1,
HierClustNode c2)
nodeNodeDistSQ() - compute node-node nDistV*(Euclidian distance squared). |
private float[] |
normalizeDataVector(float[] dataVorig)
normalizeDataVector() - compute normalized terminal dataV[] using selected ratio-normalization method. |
private void |
printDistanceMatrix()
printDistanceMatrix() - pretty print the distance matrix for debugging |
(package private) static boolean |
setEnumerateNodes()
setEnumerateNodes() - Set the top node of the tree |
private static void |
standarizeDataVector()
standarizeDataVector() - compute standardized dataV[] using (dataV[h]-mnDataV[h])/sdDataV[h] for ALL terminal objects [0:nObj-1]. |
(package private) java.lang.String |
toString(java.lang.String sMsg,
boolean addNameFlag,
boolean addDistFlag,
boolean addDataVectFlag)
toString() - convert value of a node to a String. |
Methods inherited from class java.lang.Object |
|
Field Detail |
private static MAExplorer mae
private static FileIO fio
static boolean useCorrCoeffFlag
static boolean useClusterDistCacheFlag
static boolean normByHPXflag
static boolean unWeightedAvgFlag
static boolean normHCbyRatioHPflag
static boolean stdizeByMnStdFlag
static int hierClustMode
static int cacheSize
static int enumCnt
static int termEnumCnt
static int nonTermEnumCnt
static int nObj
static int topNodeID
static int nHierClusters
static int nDataV
static int iDataV
static float[] msMaxDataS
static float maxDistLR
static float maxDgramDist
static float[][] objDataV
static int[] objCidx
static boolean useShortCacheFlag
static short[] cacheDistLR1Ds
static float[] cacheDistLR1Df
It is of size [0:((2*nObjs-1)**2/2)+(2*nObjs-1)] that is equivalent to cacheDistLR1D[addr1D(i,j)] where: i' = (j
static float[] dGramDist
static float[] dGramDistR
static float[] dGramDistL
static float[] dGramEnumOrder
static float[] dGramEOright
static float[] dGramEOleft
static int[] dGramNodeNbr
static int[] dGramRightNNbr
static int[] dGramLeftNNbr
static HierClustNode topNode
static HierClustNode nextNode
static HierClustNode[] hierClusters
boolean visited
int nodeID
int cIdx
int nbrChildren
float distLR
float enumOrder
java.lang.String name
int oFeature
HierClustNode hcParent
HierClustNode hcRight
HierClustNode hcLeft
float[] dataV
private static float[] mnDataV
private static float[] mnDataVSq
private static float[] sdDataV
private float dist
private float[] dataV1
private float[] dataV2
private float sumDistSq
private float sumCBdist
boolean memAllocFailedFlag
Constructor Detail |
HierClustNode(MAExplorer maE, int nObj, int nDataV, int hierClustMode, boolean useCorrCoeffFlag, boolean useClusterDistCacheFlag, boolean normHCbyRatioHPflag, boolean normByHPXflag, int iDataV)
maE
- is instance of MAExplorernObj
- is # of genes to clusternDataV
- is size of vectorhierClustMode
- is linkage modeuseCorrCoeffFlag
- to use corr. coeff instead of Eucl.distuseClusterDistCacheFlag
- to save memorynormHCbyRatioHPflag
- to for norm dataV[]normByHPXflag
- if normalizing, then norm by HP data for
gene in HP-X else by max value in HPiDataV
- is the colmn to norm by in [0:nDataV-1]EventMenu.setClusterDisplayState(java.awt.CheckboxMenuItem, boolean)
,
HPxyData
,
MaHybridSample
,
Util.showMsg1(java.lang.String, java.awt.Color, java.awt.Color)
,
Util.showMsg2(java.lang.String)
,
Util.showMsg3(java.lang.String)
HierClustNode(MAExplorer maE, int nObj, int nDataV, GeneList ml, float[][] geneEPvector)
maE
- is instance of MAExplorernObj
- is # of genes to clusternDataV
- is size of vectorhierClustMode
- is linkage modeml
- is the GeneList of genes to use in clusteringgeneEPvector
- is the array of data vectors [nObj][nDataV]HPxyData
,
normalizeDataVector(float[])
HierClustNode(int cIdx, float[] dataVorig, java.lang.String name, int oFeature)
cIdx
- is index of external objectdataVorig
- is the mean vector of (L+R)/2name
- of leaf objectoFeature
- is the optional featurenormalizeDataVector(float[])
HierClustNode(HierClustNode hcRight, HierClustNode hcLeft)
hcRight
- is the right child clusterhcLeft
- is the left child clustercalcMeanDataVect(HierClustNode, HierClustNode)
,
nodeNodeDistSQ(HierClustNode, HierClustNode)
Method Detail |
private float[] normalizeDataVector(float[] dataVorig)
For all h in 0:nDataV, if normHCbyRatioHPflag dataV'[] = dataV[] / dataV[iDataV] else normalize by maximum value for each sample: dataV'[h] = dataV[h] / maxRawIntensity[h]If stdizeByMnStdFlag, then compute the mnData[h], mnDataSq[h] and we finish normalizing the data later.
dataVorig
- is original data vectorprivate void calcStandardizeMnStdDev()
private static void standarizeDataVector()
boolean calcHierCluster(float[][] objectsVectors, java.lang.String[] objNames, int[] objFeatureList, int hierClusterMode)
It clusters the objectsVectors[0:nObj-1][0:nDataV-1] where cIdx is the nObj index. using: nodes hierClusters[0:(2*nObj-1)-1]
objectsVectors
- is object vectors of size [nObj][nDataV]objNames
- is object names of size [nObj]objFeatureList
- is the optional feature list of size [nObj]hierClusterMode
- for clusteringUtil.showMsg2(java.lang.String)
,
buildClusterGramLists()
,
buildDendrogramLists()
,
calcStandardizeMnStdDev()
,
findClosestNode(HierClustNode, int)
,
HierClustNode(MAExplorer, int, int, int, boolean, boolean, boolean, boolean, int)
,
nodeNodeDistSQ(HierClustNode, HierClustNode)
,
standarizeDataVector()
private void printDistanceMatrix()
private final int addr1D(int x, int y)
x
- coordinate of lower diagonal arrayy
- coordinate of lower diagonal arrayHierClustNode findClosestNode(HierClustNode hcI, int i)
hcI
- node to search for node closest toi
- corresponding to hcIaddr1D(int, int)
,
nodeNodeDistSQ(HierClustNode, HierClustNode)
private final float nodeNodeDistSQ(HierClustNode c1, HierClustNode c2)
Compute average-arithmetic linkages from non-LEAF distances using AVG cluster methods. See Sneath&Sokol pg 215, 218, 230, ... It computes the average-arithmetic linkage if hierClustMode==mae.HIER_CLUST_PGMA_LNKG. It computes the average-acentroid linkage if hierClustMode==mae.HIER_CLUST_PGMC_LNKG. Then use unWeightedAvgFlag for un-weighted average else use the weighted average. if useCorrCoeffFlag then compute r= calcPearsonCorrCoef(c1.dataV, c2.dataV, nDataV, use population covar) and return dist= (1-r)*(1-r). else compute Euclidean distance squared
c1
- nodec2
- nodeMathMAE.calcPearsonCorrCoef(float[], float[], int, boolean)
,
MathMAE.euclidDist(float[], float[], int, boolean)
private float[] calcMeanDataVect(HierClustNode c1, HierClustNode c2)
1. WPMGA weighted terminal method: Equal weights (i.e. 1/2) to left and right subclusters. NOTE: this does weights large and small clusters the same! This method averages the left and right node values: mean of left and right values. 2. (UPGMA unweighted pair-group method using arithmetic averages) [#c1*val1+#c2*val2] (sum terminal nodes)/# terminal nodes
c1
- nodec2
- nodestatic boolean setEnumerateNodes()
static HierClustNode enumerateNodes()
Note: terminal nodes are indicated by cIdx==-1. topNode.hcParent = topNode;
static HierClustNode enumerateLeafNodes()
enumerateNodes()
int[] findSubtreeOfNodeNbrs(int topNodeNbrOfSubtree)
Need to traverse: dGramNodeNbr[0:nObj-1] - Node Number dGramEOrightNNbr[0:nObj-1] - Right Node Number dGramLeftNNbr[0:nObj-1] - Left Node Number
topNodeNbrOfSubtree
- defines subtree nodeprivate float getEnumOrder()
private static boolean buildClusterGramLists()
objDataV[] and objCidx[] for rows in the ClusterGram dGramXXXX[] for drawing the dendrogram.
enumerateLeafNodes()
,
setEnumerateNodes()
private static boolean buildDendrogramLists()
objDataV[] and objCidx[] for rows in the ClusterGram dGramXXXX[] for drawing the dendrogram.
enumerateNodes()
,
getEnumOrder()
,
setEnumerateNodes()
java.lang.String toString(java.lang.String sMsg, boolean addNameFlag, boolean addDistFlag, boolean addDataVectFlag)
sMsg
- is optional messageaddNameFlag
- to add name to the stringaddDistFlag
- to add distance to the stringaddDataVectFlag
- to add dataVect to the stringstatic void cleanup()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |