Class MJAcluster

java.lang.Object
  |
  +--MJAbase
        |
        +--MJAcluster

public class MJAcluster
extends MJAbase

MAExplorer Open Java API class to access MJAcluster methods and data structures. Access cluster data structures and support methods.

List of methods available to Plugin-writers

 get_useCorrelationCoefficientFlag() - get gene-gene distance metric flag.
 set_hierClusterUnWeightedAvgFlag() - hier-clustering averaging method flag
 get_clusterOnFilteredGenesFlag() - to show cluster on Filtered genes 
 get_useSimilarGeneClusterDisplayFlag() - if similar genes clustering method is active
 get_useClusterCountsDisplayFlag() - if gene cluster counts of # similar genes is active
 get_normHierClusterByRatioHPFlag() - to norm hierarchical cluster by ratio HP 
 get_useMedianForKmeansClusteringFlag() - use K-median else default K-means 
 get_useHierClusterDisplayFlag() - if hierarchical clustering method is active  
 get_useKmeansClusterCountsDispFlag() - show K-means clusters counts
 get_useLSQmagnitudeNormalizationFlag() - to normalize cluster expression 
 get_useClusterDistanceCacheFlag() - to use cluster distance cache 
 get_useShortClusterDistanceCacheFlag() - use short[] cluster dist cache
 getKmeansClusters() -  list of K-means clustering data.
 getClusterOfSimilarGenes() - list of gene cluster similar to seed gene.
 getHierClusterOfGenes() - get Hashtable list of hierarchical gene cluster 
 addr1D() - lookup lower-diagonal addr1D(x,y) [y' + x'*(x'+1)/2]
 computeGeneGeneDistanceMatri() - return gene-gene cluster distance matrix.
 calcNormGeneVectors() - calc HP-E intensity vector for genes to normalize
 findGeneWithLeastSumDistances() - find gene with minimum cluster distance

This work was produced by Peter Lemkin of the National Cancer Institute, an agency of the United States Government. As a work of the United States Government there is no associated copyright. It is offered as open source software under the Mozilla Public License (version 1.1) subject to the limitations noted in the accompanying LEGAL file. This notice must be included with the code. The MAExplorer Mozilla and Legal files are available on http://maexplorer.sourceforge.net/.

Version:
$Date: 2003/07/07 21:40:41 $ $Revision: 1.7 $
Author:
P. Lemkin (NCI), J. Evans (CIT), C. Santos (CIT), G. Thornwall (SAIC), NCI-Frederick, Frederick, MD
See Also:
MAExplorer Home

Fields inherited from class MJAbase
COMPARE_ALL, COMPARE_ANY, COMPARE_AT_LEAST, COMPARE_AT_MOST, COMPARE_PRODUCT, COMPARE_SUM, DATA_F1TOT, DATA_F2TOT, DATA_MEAN_F1F2TOT, DATA_RATIO_F1F2TOT, DRAW_BIN, DRAW_BOX, DRAW_CIRCLE, DRAW_PLUS, EDIT_ADD, EDIT_NOP, EDIT_RMV, GENE_ATCC_ID, GENE_BAD_DATA, GENE_BAD_LOCAL_SPOT_BKGRD, GENE_BAD_MID, GENE_BAD_SPOT, GENE_BAD_SPOT_GEOMETRY, GENE_DUP_SPOT, GENE_GOOD_MID, GENE_IMAGE_ID, GENE_IS_CUR_GENE, GENE_IS_EGL_GENE, GENE_IS_FILTERED, GENE_IS_KMEANS, GENE_IS_NOT_FILTERED, GENE_LOW_SPOT_REF_SIGNAL, GENE_MARGINAL_SPOT, GENE_USE_GBID_FOR_CLONEID, HIER_CLUST_NEXT_MIN_LNKG, HIER_CLUST_PGMA_LNKG, HIER_CLUST_PGMC_LNKG, MARKER_CIRCLE, MARKER_CURRENT, MARKER_GENES, MARKER_KMEANS_CLUSTER, MARKER_NONE, MARKER_PLUS, MARKER_SQUARE, MASTER_CLONE_ID, MASTER_DBEST3, MASTER_DBEST5, MASTER_GENBANK, MASTER_GENBANK3, MASTER_GENBANK5, MASTER_GENE_NAME, MASTER_GENERIC_ID, MASTER_LOCUSLINK, MASTER_SWISS_PROT, MASTER_UG_ID, MASTER_UG_NAME, MAX_COLORS, PLOT_CLUSTER_GENES, PLOT_CLUSTER_HIER, PLOT_CLUSTER_HYBSAMPLES, PLOT_CLUSTERGRAM, PLOT_EXPR_PROFILE, PLOT_F1_F2_INTENS, PLOT_F1_F2_MVSA, PLOT_HIST_F1F2_RATIO, PLOT_HIST_HP_XY_RATIO, PLOT_HIST_HP_XY_SETS_RATIO, PLOT_HP_XY_INTENS, PLOT_INTENS_HIST, PLOT_KMEANS_CLUSTERGRAM, PLOT_PSEUDO_F1F2_IMG, PLOT_PSEUDO_F1F2_RYG_IMG, PLOT_PSEUDO_HP_XY_IMG, PLOT_PSEUDO_HP_XY_RYG_IMG, PLOT_PSEUDOIMG, PRPROP_CUR_GENE, PRPROP_FILTER, PRPROP_LABEL, PRPROP_SLIDER, PRPROP_TIMEOUT, PRPROP_UNIQUE, QUALTYPE_ALPHA, QUALTYPE_PROP_CODE, QUALTYPE_THR, RANGE_INSIDE, RANGE_OUTSIDE, RPT_FMT_DYN, RPT_FMT_TAB_DELIM, RPT_NONE, RPT_TBL_ALL_GENES_CLUSTER, RPT_TBL_CALIB_DNA_STAT, RPT_TBL_CUR_GENE_CLUSTER, RPT_TBL_EDITED_GENE_LIST, RPT_TBL_EXPR_PROFILE, RPT_TBL_FILTERED_GENES, RPT_TBL_GENE_CLASS, RPT_TBL_HIER_CLUSTER, RPT_TBL_HIGH_F1F2, RPT_TBL_HIGH_RATIO, RPT_TBL_HP_DB_INFO, RPT_TBL_HP_HP_CORR, RPT_TBL_HP_MN_VAR_STAT, RPT_TBL_HP_XY_SET_STAT, RPT_TBL_KMEANS_CLUSTER, RPT_TBL_LOW_F1F2, RPT_TBL_LOW_RATIO, RPT_TBL_MAE_PRJ_DB, RPT_TBL_MN_KMEANS_CLUSTER, RPT_TBL_NAMED_GENES, RPT_TBL_NORMALIZATION_GENE_LIST, RPT_TBL_OCL_STAT, RPT_TBL_SAMPLES_DB_INFO, RPT_TBL_SAMPLES_WEB_LINKS, SS_MODE_ELIST, SS_MODE_MS, SS_MODE_XANDY_SETS, SS_MODE_XORY_SETS, SS_MODE_XSET, SS_MODE_XY, SS_MODE_YSET
 
Method Summary
 int addr1D(int x, int y)
          addr1D() - lookup lower-diagonal addr1D(x,y) [y' + x'*(x'+1)/2]
 float[][] calcNormGeneVectors(java.lang.String geneListToNormalize)
          calcNormGeneVectors() - compute HP-E intensity vector for geneList To Normalize.
 float[] computeGeneGeneDistanceMatrix(java.lang.String geneListToUse)
          computeGeneGeneDistanceMatrix() - return gene-gene cluster distance matrix.
 int findGeneWithLeastSumDistances(float[] ccDist1D, int nGenes)
          findGeneWithLeastSumDistances() - find gene with minimum cluster distance to all other genes.
 boolean get_clusterOnFilteredGenesFlag()
          get_clusterOnFilteredGenesFlag() - get flag to cluster on Filtered genes else All genes.
 boolean get_normHierClusterByRatioHPFlag()
          get_normHierClusterByRatioHPFlag() - get flag to that normalizing each HP-E gene expression vector dataV[1:nHP] by the ratio of HP-X dataV[HP-X].
 boolean get_useClusterCountsDisplayFlag()
          get_useClusterCountsDisplayFlag() - get flag to show if gene cluster counts cluster method is active.
 boolean get_useClusterDistanceCacheFlag()
          get_useClusterDistanceCacheFlag() - get flag to use cluster distance cache to speed up computations.
 boolean get_useCorrelationCoefficientFlag()
          get_useCorrelationCoefficientFlag() - get gene-gene distance metric flag.
 boolean get_useHierClusterDisplayFlag()
          get_useHierClusterDisplayFlag() - get flag show hierarchical clusters in the pseudoarray image for All Filtered genes.
 boolean get_useKmeansClusterCountsDispFlag()
          get_useKmeansClusterCountsDispFlag() - get flag show K-means cluster counts for all Filtered genes in the pseudoarray image.
 boolean get_useLSQmagnitudeNormalizationFlag()
          get_useLSQmagnitudeNormalizationFlag() - get flag to normalize cluster expression vector geneEPvect[] to 1.0 for clustering
 boolean get_useMedianForKmeansClusteringFlag()
          get_useMedianForKmeansClusteringFlag() - get flag for K-median else K-means clustering
 boolean get_useShortClusterDistanceCacheFlag()
          get_useShortClusterDistanceCacheFlag() - get flag t ouse short[] cluster distance cache to save memory
 boolean get_useSimilarGeneClusterDisplayFlag()
          get_useSimilarGeneClusterDisplayFlag() - get flag to show if similar genes cluster method is active.
 java.util.Hashtable getClusterOfSimilarGenes(java.lang.String geneListToCluster, float curGeneDistanceThr, int initialSeedGeneMID)
          getClusterOfSimilarGenes() - get Hashtable list of gene cluster similar to seed gene.
 java.util.Hashtable getHierClusterOfGenes(java.lang.String geneListToCluster)
          getHierClusterOfGenes() - get Hashtable list of hierarchical gene cluster of genes.
 java.util.Hashtable getKmeansClusters(java.lang.String geneListToCluster, int nbrOfClusters, int initialSeedGeneMID)
          getKmeansClusters() - get Hashtable list of K-means clustering data.
 void set_hierClusterUnWeightedAvgFlag(boolean value)
          set_hierClusterUnWeightedAvgFlag() - set hier-clustering averaging method flag.
 
Methods inherited from class MJAbase
cvtHashtable2SimpleTable, cvtTable2Hashtable
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

get_useCorrelationCoefficientFlag

public final boolean get_useCorrelationCoefficientFlag()
get_useCorrelationCoefficientFlag() - get gene-gene distance metric flag. It is true if use correlation coefficient for computing the gene-gene distance metric, false if use Euclidean distance.
Returns:
value of flag

set_hierClusterUnWeightedAvgFlag

public final void set_hierClusterUnWeightedAvgFlag(boolean value)
set_hierClusterUnWeightedAvgFlag() - set hier-clustering averaging method flag. If true useUPGMA else WPGMA hierarchical Clustering average. gene-gene distance metric, false if use Euclidean distance.
Parameters:
value - to set flag

get_clusterOnFilteredGenesFlag

public final boolean get_clusterOnFilteredGenesFlag()
get_clusterOnFilteredGenesFlag() - get flag to cluster on Filtered genes else All genes.
Returns:
value of flag

get_useSimilarGeneClusterDisplayFlag

public final boolean get_useSimilarGeneClusterDisplayFlag()
get_useSimilarGeneClusterDisplayFlag() - get flag to show if similar genes cluster method is active.
Returns:
value of flag

get_useClusterCountsDisplayFlag

public final boolean get_useClusterCountsDisplayFlag()
get_useClusterCountsDisplayFlag() - get flag to show if gene cluster counts cluster method is active.
Returns:
value of flag

get_normHierClusterByRatioHPFlag

public final boolean get_normHierClusterByRatioHPFlag()
get_normHierClusterByRatioHPFlag() - get flag to that normalizing each HP-E gene expression vector dataV[1:nHP] by the ratio of HP-X dataV[HP-X]. Otherwise, normalize each element by the maximum value for each sample.
Returns:
value of flag

get_useMedianForKmeansClusteringFlag

public final boolean get_useMedianForKmeansClusteringFlag()
get_useMedianForKmeansClusteringFlag() - get flag for K-median else K-means clustering
Returns:
value of flag

get_useHierClusterDisplayFlag

public final boolean get_useHierClusterDisplayFlag()
get_useHierClusterDisplayFlag() - get flag show hierarchical clusters in the pseudoarray image for All Filtered genes.
Returns:
value of flag

get_useKmeansClusterCountsDispFlag

public final boolean get_useKmeansClusterCountsDispFlag()
get_useKmeansClusterCountsDispFlag() - get flag show K-means cluster counts for all Filtered genes in the pseudoarray image.
Returns:
value of flag

get_useLSQmagnitudeNormalizationFlag

public final boolean get_useLSQmagnitudeNormalizationFlag()
get_useLSQmagnitudeNormalizationFlag() - get flag to normalize cluster expression vector geneEPvect[] to 1.0 for clustering
Returns:
value of flag

get_useClusterDistanceCacheFlag

public final boolean get_useClusterDistanceCacheFlag()
get_useClusterDistanceCacheFlag() - get flag to use cluster distance cache to speed up computations.
Returns:
value of flag

get_useShortClusterDistanceCacheFlag

public final boolean get_useShortClusterDistanceCacheFlag()
get_useShortClusterDistanceCacheFlag() - get flag t ouse short[] cluster distance cache to save memory
Returns:
value of flag

getKmeansClusters

public final java.util.Hashtable getKmeansClusters(java.lang.String geneListToCluster,
                                                   int nbrOfClusters,
                                                   int initialSeedGeneMID)
getKmeansClusters() - get Hashtable list of K-means clustering data. Cluster genes passing the data Filter before getting the data. Only genes passing the data Filter are used in computing the data.
 The Hashtable list returned is defined as:
 nAme                   - Value
 "NbrClusters"            - int number of clusters
 "NbrSamples"             - int number of HP-E samples
 "NbrGenes"               - int number of Genes clustered
 "NameGeneListToCluster"  - String name of GeneList set to cluster
 "InitialSeedGeneMID"     - int initial seed gene MID to start cluster 1
 "maxGlobalDist"          - float max Global cluster distance for
                            all genes
 "kMeansDist"             - float[1:NbrClusters] distance between K-means
                            clusters
 "kMeansMaxDist"          - float[1:NbrClusters] max distance within each
                            K-means cluster
 "mnWithinClusterDist"    - float[1:NbrClusters] mean within-cluster
                            distance between K-means clusters
 "sdWithinClusterDist"    - float[1:NbrClusters] StdDev within-cluster
                            distance between K-means clusters
 "kMeansList"             - int[1:NbrClusters] index of gene as K-means
                            cluster center
 "hpDataNbrA"             - int [0:NbrClusters-1] opt # genes/K-means
                            cluster for HP-E samples
 "hpDataMnA"              - float[0:NbrClusters-1][0:NbrSamples-1] opt
                            Mean HP-E quant data
 "hpDataSDA"              - float[0:NbrClusters-1][0:NbrSamples-1] opt.
                            StdDev HP-E quant data
 "nClustersFound"         - int # of unique K-means clusters
                            actually found
 "maxKmeansNodes"         - int max # of K-means clusters set by
                            thresholding slider
 "geneEPvector"           - float[0:NbrGenes-1][0:NbrSamples-1]
                            normalized quantified gene vector
 "ClusterMeansGeneList"   - Hashtable list GeneList of means of clusters
 "CurClusterGeneList"      - Hashtable list GeneList of genes in current
                            cluster
 

Parameters:
geneListToCluster - name of gene list with genes to cluster
nbrOfClusters - # of clusters to generate
initialSeedGeneMID - initial seed gene specified by MID
Returns:
Hashtable list else null if error.

getClusterOfSimilarGenes

public final java.util.Hashtable getClusterOfSimilarGenes(java.lang.String geneListToCluster,
                                                          float curGeneDistanceThr,
                                                          int initialSeedGeneMID)
getClusterOfSimilarGenes() - get Hashtable list of gene cluster similar to seed gene. Cluster genes passing the data Filter before getting the data. Only genes passing the data Filter are used in computing the data. The similar genes are also copied to the Edited Gene list.
 The Hashtable list returned is defined as:
 Name                   - Value
 "NbrSamples"               - int number of HP-E samples
 "NbrGenes"                 - int number of Genes clustered
 "NameGeneListToCluster"    - String name of GeneList set to cluster
 "InitialSeedGeneMID"       - int initial seed gene MID to start cluster
 "maxGlobalDist"            - float max Global cluster distance for all
                              genes
 "curGeneDistanceThr"       - float threshold distance for similar genes
 "NbrSimilarGenesInCluster" - int # genes in cluster simiar to seed gene
 "CurClusterGeneList"       - Hashtable of GeneList of genes in current
                              cluster
 

Parameters:
geneListToCluster - name of gene list with genes to cluster
curGeneDistanceThr - threshold distance for similar genes
initialSeedGeneMID - initial seed gene specified by MID
Returns:
null if not found or error, else return Hashtable list.

getHierClusterOfGenes

public final java.util.Hashtable getHierClusterOfGenes(java.lang.String geneListToCluster)
getHierClusterOfGenes() - get Hashtable list of hierarchical gene cluster of genes. Data is clustered as a function of expression profile.
 The Hashtable list returned is defined as:
 Name                   - Value
 "NbrSamples"               - int number of HP-E samples
 "NbrGenes"                 - int number of Genes clustered
 "NameGeneListToCluster"    - String name of GeneList set to cluster
 "NormByRatioHPflag"        - boolean flag: normalize EP data before
                              cluster by "HP-X Sample data/gene"
                              else "HP max intensities data"
 "HierClusterUnWtAvgFlag"   - boolean flag: "unweighted-avg"
                              else "weighted-avg"
 "HierClusterMode"          - int cluster linkage mode. Either
                              HIER_CLUST_NEXT_MIN_LNKG,
                              HIER_CLUST_PGMA_LNKG or HIER_CLUST_PGMC_LNKG
 "CurClusterGeneList"       - Hashtable list GeneList of genes in current
                              cluster in order of hierarchical cluster.
 "NbrNodes"                 - int number of nodes in the cluster
                              (2*NbrGenes-1)
 "MaxDistLR"                - float max distance (squared) between any
                              Left and Right found between all nodes
 "TreeEnumeration"          - Hashtable enumeration of the hierarchical
                              cluster

 The TreeEnumeration Hashtable list is returned an enumeration
 of the cluster tree and is defined as:
 Name                   - Value
 "GeneMID"                  - int Gene MID index if it is a terminal
                              node else -1
 "NodeID"                   - int unique cluster node #
 "EnumOrder"                - float enumeration order in range
                              [0:NbrNodes-1].
                              Intermediate nodes may be fractional eg.
                              (LeftChildNode.enumOrder +
                               RightChildNode.enumOrder)/2.0
 "ParentNodeID"             - int if not -1, then parent node
 "LeftChildNodeID"          - int if not -1, then left node
 "RightChildNodeID"         - int distance between left and right
                              children nodes
 "MeanEPdataForNode"        - mean vector float [0:NbrSamples-1] of left
                              and right
 "NbrChildren"              - int # of children of Node
 

Parameters:
geneListToCluster - name of gene list with genes to cluster
Returns:
null if not found or error, else return Hashtable list.

addr1D

public final int addr1D(int x,
                        int y)
addr1D() - lookup lower-diagonal addr1D(x,y) [y' + x'*(x'+1)/2]
 where:
    x' = (x>y) ? x : y,
    y' = (x>y) ? y : x.
Parameters:
x - coordinate
y - coordinate
Returns:
the computed 1D address for the 2D (x,y) address.

computeGeneGeneDistanceMatrix

public final float[] computeGeneGeneDistanceMatrix(java.lang.String geneListToUse)
computeGeneGeneDistanceMatrix() - return gene-gene cluster distance matrix. This is encoded as an lower diagonal 1D array using for genes x and y using addr1D(x,y)to access the data.
     addr1D(x,y) = [y' + x'*(x'+1)/2]
 where:
    x' = (x>y) ? x : y,
    y' = (x>y) ? y : x.
Parameters:
geneListToUse - is the name of the genelist to filter by
Returns:
matrix if it exists, else return null.

calcNormGeneVectors

public final float[][] calcNormGeneVectors(java.lang.String geneListToNormalize)
calcNormGeneVectors() - compute HP-E intensity vector for geneList To Normalize. Normalize by the first entry of each vector. [TODO] add notes on normalization options set in the menus and readable using flag query methods here.
Parameters:
geneListToNormalize - gene list to normalize.
Returns:
normalized [0:nbrGenesInGeneList-1][0:nbrSamples-1].

findGeneWithLeastSumDistances

public final int findGeneWithLeastSumDistances(float[] ccDist1D,
                                               int nGenes)
findGeneWithLeastSumDistances() - find gene with minimum cluster distance to all other genes.
Parameters:
ccDist1D - gene-gene distance computed with computeGeneGeneDistanceMatrix()
nGenes - is the # of genes in gene-gene matrix.
Returns:
the MID gene index if it exists, else return -1 if errors.
See Also:
computeGeneGeneDistanceMatrix(java.lang.String)