Class Statistics

java.lang.Object
  |
  +--Statistics

class Statistics
extends java.lang.Object

The class contains various parametric and non-parametric probability statistics methods. These include: F-test, t-test, p-values for t-test, histogram, mean, stdDev, median, mode, extrema, etc.

NOTE: Statistics package (derived from WebGel and GELLAB-II which were derived from Numerical Recipes, etc.).

This work was produced by Peter Lemkin of the National Cancer Institute, an agency of the United States Government. As a work of the United States Government there is no associated copyright. It is offered as open source software under the Mozilla Public License (version 1.1) subject to the limitations noted in the accompanying LEGAL file. This notice must be included with the code. The MAExplorer Mozilla and Legal files are available on http://maexplorer.sourceforge.net/.

Version:
$Date: 2003/02/20 21:35:04 $ $Revision: $
Author:
P. Lemkin (NCI), G. Thornwall (SAIC), NCI-Frederick, Frederick, MD
See Also:
MAExplorer Home

Field Summary
(package private)  float[][] condData
          data[0:nConditions-1][sampleNbrInClass] for computations, set by calcMeanAndVariance()
(package private)  float deltaBinH
          width of the histogram bins computed as: nBinsH/(maxDataH-minDataH)
(package private)  double dF
          CALC: degrees of freedom of t-test or F-test (2 conditions) set by calcTandPvalues()
(package private)  double dfBetween
          CALC: degrees of freedom dfBetween, f-statistic nConditions test set by calcNCondFtestStat()
(package private)  double dFks
          CALC: degrees of freedom for KS-test set by calcKStestStat()
(package private)  double dfWithin
          CALC: degrees of freedom dfWithin, f-statistic nConditions test set by calcNCondFtestStat()
(package private)  double f
          CALC: calculated f statistic set by calcTandPvalues()
(package private)  double fStat
          CALC: f-statistic 2 conditions
(package private)  double fStatNconds
          CALC: f-statistic nConditions test set by calcNCondFtestStat()
(package private)  int[] hist
          histogram of size [0:nBinsH-1] set by calcHistStats()
(package private)  double ksD
          CALC: KS-test Kolmogorov-Smirnov D statistic set by calcKStestStat()
private  MAExplorer mae
          link to global instance of MAExplorer
(package private)  float maxDataH
          histogram data max value set by calcHistStats()
(package private)  double[] mean
          means[0:nConditions-1] of data for computations, set by calcMeanAndVariance()
(package private)  float meanAbsDevH
          histogram data mean absolute deviation set by calcHistStats()
(package private)  float meanH
          histogram data median set by calcHistStats()
(package private)  int meanIdx
          index of mean in hist[] set by calcHistStats()
(package private)  float medianH
          histogram data median set by calcHistStats()
(package private)  int medianIdx
          index of median in hist[] set by calcHistStats()
(package private)  float minDataH
          histogram data min value set by calcHistStats()
(package private)  double mnSqBetween
          CALC: mean square between variance, f-statistic nConditions test set by calcNCondFtestStat()
(package private)  double mnSqWithin
          CALC: mean square within variance, f-statistic nConditions test set by calcNCondFtestStat()
(package private)  float modeH
          histogram data mode set by calcHistStats()
(package private)  int modeIdx
          index of mode in hist[] set by calcHistStats()
(package private)  int nBinsH
          0 if none.
private  int nCondAlloc
          size of in mean,variance,stdDev,condData,nCondData stats arrays set by calcNCondFtestStat()
(package private)  int[] nCondData
          data[0:nConditions-1][sampleNbrInClass] for computations, set by calcMeanAndVariance()
(package private)  int nConditions
          # of conditions in mean,var stats set by calcNCondFtestStat()
(package private)  double pF
          CALC: f-test p-value w/NULL hypoth set by calcTandPvalues()
(package private)  double pFnConds
          CALC: f-test p-value w/NULL hypoth for nConditions set by calcNCondFtestStat()
(package private)  double pKS
          CALC: KS-test p-value w/NULL hypoth set by calcKStestStat()
(package private)  double pT
          CALC: t-test p-value w/NULL hypoth set by calcTandPvalues()
(package private)  double[] stdDev
          stdDev[0:nConditions-1] of data for computations, set by calcMeanAndVariance() if calcStdDevFlag set on call
(package private)  float stdDevH
          histogram data stdDev set by calcHistStats()
(package private)  double t
          CALC: t or t' statistic set by calcTandPvalues()
(package private)  java.lang.String title
          for data used in histogram set by calcHistStats()
(package private)  char useTest
          CALC: 'B' or 'T' - t-test to use set by calcTandPvalues()
(package private)  double[] variance
          variance[0:nConditions-1] of data for computations, set by calcMeanAndVariance()
 
Constructor Summary
(package private) Statistics()
          Statistics() - constructor
(package private) Statistics(MAExplorer mae)
          Statistics() - constructor
 
Method Summary
(package private)  boolean calcFprobFromVariances(int n1, int n2, double var1, double var2)
          calcFprobFromVariances() - calculate 2-tailed f probility that variables are same.
(package private)  int calcHistStats(java.lang.String title, int nBins, float[] data, int nData)
          calcHistStats() - compute and analyze histogram generating statistics.
(package private)  int calcHistStats(java.lang.String title, int nBins, float[] data, int nData, int[] hist)
          calcHistStats() - compute and analyze a histogram for whatever range of data is given.
(package private)  boolean calcKStestStat(double[] data1, int n1, double[] data2, int n2)
          calcKStestStat() - calculate Kolmogorov-Smirnov ksD, pKS, dFks from (n1,data1) and (n2,data2).
(package private)  boolean calcMeanAndVariance(float[] dataS, int nSamples, int conditionK, boolean calcStdDevFlag)
          calcMeanAndVariance() - compute the mean and variance of dataS[] for 1 condition and save the results inmean[conditionK] and variance[conditionK] arrays.
(package private)  boolean calcNCondFtestStat(float[][] data, int[] nData, int nConditions)
          calcNCondFtestStat() - calc.
(package private)  boolean calcTandPvalues(int n1, int n2, double m1, double m2, double s1, double s2)
          calcTandPvalues() - calculate f, t, p, dF from (n1,m1,s1) and (n2,m2,s2).
(package private)  boolean calcWMWtestStat(int n1, int n2, double m1, double m2, double s1, double s2)
          calcWMWtestStat() - calculate WMW statistics from (n1,m1,s1) and (n2,m2,s2).
 
Methods inherited from class java.lang.Object
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, toString, wait, wait, wait
 

Field Detail

mae

private MAExplorer mae
link to global instance of MAExplorer

pT

double pT
CALC: t-test p-value w/NULL hypoth set by calcTandPvalues()

pF

double pF
CALC: f-test p-value w/NULL hypoth set by calcTandPvalues()

pKS

double pKS
CALC: KS-test p-value w/NULL hypoth set by calcKStestStat()

pFnConds

double pFnConds
CALC: f-test p-value w/NULL hypoth for nConditions set by calcNCondFtestStat()

f

double f
CALC: calculated f statistic set by calcTandPvalues()

t

double t
CALC: t or t' statistic set by calcTandPvalues()

fStat

double fStat
CALC: f-statistic 2 conditions

ksD

double ksD
CALC: KS-test Kolmogorov-Smirnov D statistic set by calcKStestStat()

dF

double dF
CALC: degrees of freedom of t-test or F-test (2 conditions) set by calcTandPvalues()

dFks

double dFks
CALC: degrees of freedom for KS-test set by calcKStestStat()

useTest

char useTest
CALC: 'B' or 'T' - t-test to use set by calcTandPvalues()

fStatNconds

double fStatNconds
CALC: f-statistic nConditions test set by calcNCondFtestStat()

mnSqWithin

double mnSqWithin
CALC: mean square within variance, f-statistic nConditions test set by calcNCondFtestStat()

mnSqBetween

double mnSqBetween
CALC: mean square between variance, f-statistic nConditions test set by calcNCondFtestStat()

dfWithin

double dfWithin
CALC: degrees of freedom dfWithin, f-statistic nConditions test set by calcNCondFtestStat()

dfBetween

double dfBetween
CALC: degrees of freedom dfBetween, f-statistic nConditions test set by calcNCondFtestStat()

title

java.lang.String title
for data used in histogram set by calcHistStats()

meanIdx

int meanIdx
index of mean in hist[] set by calcHistStats()

medianIdx

int medianIdx
index of median in hist[] set by calcHistStats()

modeIdx

int modeIdx
index of mode in hist[] set by calcHistStats()

nBinsH

int nBinsH
0 if none. size of histogram set by calcHistStats()

hist

int[] hist
histogram of size [0:nBinsH-1] set by calcHistStats()

medianH

float medianH
histogram data median set by calcHistStats()

modeH

float modeH
histogram data mode set by calcHistStats()

meanH

float meanH
histogram data median set by calcHistStats()

stdDevH

float stdDevH
histogram data stdDev set by calcHistStats()

meanAbsDevH

float meanAbsDevH
histogram data mean absolute deviation set by calcHistStats()

minDataH

float minDataH
histogram data min value set by calcHistStats()

maxDataH

float maxDataH
histogram data max value set by calcHistStats()

deltaBinH

float deltaBinH
width of the histogram bins computed as: nBinsH/(maxDataH-minDataH)

nCondAlloc

private int nCondAlloc
size of in mean,variance,stdDev,condData,nCondData stats arrays set by calcNCondFtestStat()

nConditions

int nConditions
# of conditions in mean,var stats set by calcNCondFtestStat()

condData

float[][] condData
data[0:nConditions-1][sampleNbrInClass] for computations, set by calcMeanAndVariance()

nCondData

int[] nCondData
data[0:nConditions-1][sampleNbrInClass] for computations, set by calcMeanAndVariance()

mean

double[] mean
means[0:nConditions-1] of data for computations, set by calcMeanAndVariance()

variance

double[] variance
variance[0:nConditions-1] of data for computations, set by calcMeanAndVariance()

stdDev

double[] stdDev
stdDev[0:nConditions-1] of data for computations, set by calcMeanAndVariance() if calcStdDevFlag set on call
Constructor Detail

Statistics

Statistics()
Statistics() - constructor

Statistics

Statistics(MAExplorer mae)
Statistics() - constructor
Parameters:
mae - is MAExplorer instance
Method Detail

calcMeanAndVariance

boolean calcMeanAndVariance(float[] dataS,
                            int nSamples,
                            int conditionK,
                            boolean calcStdDevFlag)
calcMeanAndVariance() - compute the mean and variance of dataS[] for 1 condition and save the results inmean[conditionK] and variance[conditionK] arrays. Also save the dataS in data[conditionK] and nSamples in nData[conditionK]. This method was derived Snedecore and Chochran Statistical Methods.
Parameters:
data - is array of size [0:nSamples-1] of data
nSamples - is size of data
conditionK - is the class # associated with this data (start at 0)
calcStdDevFlag - also compute stdDev[conditionK] as well
Returns:
true if succeed

calcFprobFromVariances

boolean calcFprobFromVariances(int n1,
                               int n2,
                               double var1,
                               double var2)
calcFprobFromVariances() - calculate 2-tailed f probility that variables are same. It computes:
    fStat - the f-statistic
    pF    - CALC: probab. vars. same
This method was derived from GELLAB-II which was derived from Numerical Recipes in C and Snedecore and Chochran Statistical Methods.
Parameters:
n1 - # samples class 1
n2 - # samples class 2
var1 - variance of class 1
var2 - variance of class 2
Returns:
true and set variables if succeed, else false if any problems.
See Also:
MathMAE.nr_betai(double, double, double)

calcTandPvalues

boolean calcTandPvalues(int n1,
                        int n2,
                        double m1,
                        double m2,
                        double s1,
                        double s2)
calcTandPvalues() - calculate f, t, p, dF from (n1,m1,s1) and (n2,m2,s2). Use Behrens-Fisher/Satterthwaite estimate for t and dF if f-stat is < 0.05 p-value that variances are different. Otherwise use the standard student t-statistic with DF= (n1+n2-2). If you want to force the test the set useTest to 'B' or 'T', else it will pick the test to use (ie. TB or TP). It uses the algorithm described Numerical Recipes in C (1st Ed) for estimating p-value given the t-statistic using the incomplete beta function betai(). It computes:
    f - calculated f statistic
    t - t or t' statistic
    pT - t-test p-value w/NULL hypoth
    pF - f-test p-value w/NULL hypoth
    dF - degrees of freedom
 
This method was derived from GELLAB-II which was derived from Numerical Recipes in C and Snedecore and Chochran Statistical Methods.
Parameters:
n1 - # samples in class 1
n2 - # samples in class 2
m1 - sample mean class 1
m2 - sample mean class 2
s1 - sample std dev class 1
s2 - sample std dev class 2
Returns:
false if any of the data is invalid (need >= 2 samples/class) or the beta fct fails.
See Also:
MathMAE.nr_betai(double, double, double), calcFprobFromVariances(int, int, double, double)

calcWMWtestStat

boolean calcWMWtestStat(int n1,
                        int n2,
                        double m1,
                        double m2,
                        double s1,
                        double s2)
calcWMWtestStat() - calculate WMW statistics from (n1,m1,s1) and (n2,m2,s2). If enabled, do WMW-test for 2 classes. [TODO] - not available yet

Parameters:
n1 - # samples in class 1
n2 - # samples in class 2
m1 - sample mean class 1
m2 - sample mean class 2
s1 - sample std dev class 1
s2 - sample std dev class 2
Returns:
false if any of the data is invalid (need >= 2 samples/class)

calcKStestStat

boolean calcKStestStat(double[] data1,
                       int n1,
                       double[] data2,
                       int n2)
calcKStestStat() - calculate Kolmogorov-Smirnov ksD, pKS, dFks from (n1,data1) and (n2,data2). DF= (n1+n2-2). It computes:
    ksD - D statistic
    pKS - KS test p-value w/NULL hypoth
    dFks - degrees of freedom
 
This method was derived from GELLAB-II which was derived from Numerical Recipes in C and Snedecore and Chochran Statistical Methods.
Parameters:
data1 - sample data class 1
n1 - # samples in class 1
data2 - sample data class 2
n2 - # samples in class 2
Returns:
false if any of the data is invalid (need >= 2 samples/class)

calcHistStats

int calcHistStats(java.lang.String title,
                  int nBins,
                  float[] data,
                  int nData)
calcHistStats() - compute and analyze histogram generating statistics. for whatever range of data is given. This computes and returns the results in variables of this instance: hist[0:nBins-1], medianH, modeH, meanH, stdDevH, meanAbsDevH, minDataH, maxDataH, deltaBinH
Parameters:
title - for data
nBins - size of hist[]
data - of size [0:nData-1]
nData - size of data array
Returns:
nBins if successful.
See Also:
calcHistStats(java.lang.String, int, float[], int)

calcHistStats

int calcHistStats(java.lang.String title,
                  int nBins,
                  float[] data,
                  int nData,
                  int[] hist)
calcHistStats() - compute and analyze a histogram for whatever range of data is given. This computes and returns the results in variables of this instance:
   hist[0:nBins-1], medianH, modeH, meanH,
   stdDevH, meanAbsDevH, minDataH, maxDataH, deltaBinH.
Note: if hist is int[nBins+1], then it will not be allocated.
Parameters:
title - for data
nBins - size of hist[]
data - of size [0:nData-1]
nData - size of data array
hist - opt. [nBins+1] else null in which case it will allocate it locally
Returns:
nBins if successful else 0.

calcNCondFtestStat

boolean calcNCondFtestStat(float[][] data,
                           int[] nData,
                           int nConditions)
calcNCondFtestStat() - calc. F-test statistics of data[0:nConditions-1][samples] It computes:
    pFnConds - p value
    fStatNconds - f statistic
    mnSqWithin - mean within class variance
    mnSqBetween - mean between class variance
    dF1 - degrees of freedom df1
    dF2 - degrees of freedom df2
 
This method was derived from GELLAB-II which was derived from Numerical Recipes in C, 2nd Edition, pg 619, Sec. 14.2, and Snedecore and Chochran Statistical Methods.

Parameters:
data - sample data[nConditions][sampleNbrInCondition]
nData - # samples in each [nConditions]
nConditions - # of conditions
Returns:
false if any of the data is invalid (need >1 sample/Condition)