Figure 2.4.3 Filter menu. The Filter menu is a cascade of data
filters that restrict the set of genes passing all filters that
have been enabled and whatever the criteria was that was set for those
filters. This figure shows the GeneClass filter set to "All genes and
ESTs", the spot CV filter and Ratio (X/Y) range filters being set
interactively by the scroll bars on the right. The genes that pass
the filter are indicated with a red (white) circle in the array
intensity (ratio) pseudoarray image.
The Filter menu options are used to restrict the set of genes
by pre-filtering the data with a series of cascaded filter criteria
and tests. The resulting subset of genes passing the filter are then
used in the plots, reports and other data analysis methods. Some of
the filters require additional parameters that are set by the State
scrollers. The user will automatically be prompted for changes to
these scollers (a threshold scrollers window will pop up) when the
filter is activated or change. These values may also be set from the
Adjust all Filter threshold scrollers entry in the
Preferences submenu in the Edit menu. The filters are
broken up into subgroups in the following menu with the grouping
haveing more to do with the criteria (i.e. gene set membership, data
range, or statistical tests).
The Filter by positive intensity
data submenu filter contains options that specify which spot
intensity values are to be considered when excluding negative
quantified spot data. Note: this filter only makes sense if your data
might have negative values (e.g. Affymetrix chip "Avg Diff" data) or a
background corrected value that is less than 0.0. The filter is
enabled by setting the "Filter by spots with positive intensity"
checkbox. Negative intensity values may occur with some types of
arrays quantification programs. In the "Check spots for positive
values mode" submenu, you may set the samples where the test may be
applied to spots from the current HP, the single (HP-X,HP-Y) samples,
(HP-X,HP-Y) 'sets' (replicated spots), or samples in the HP-E list
selected to be used in the filter. If there are (F1,F2) or (Cy3/Cy5)
data, then each spot must meet the threshold criteria.
The Filter by Good Spot data submenu filter contains options
that specify spots based on their quality. It filters out genes that
have that do not have "Good Spot" values defined by the optional
QualCheck spot data. (See the list of codes in Appendix C.4). If there is no
such spot quality data, then all spots are considered "good". The
filter is enabled by setting the "Filter by spots with Good Spot
values" checkbox. All spots for the specified samples must meet the
criteria. In the "Check spots for Good Spot mode" submenu, you may set
the samples where the test may be applied to spots from the current
HP, the single (HP-X,HP-Y) samples, (HP-X,HP-Y) 'sets' (replicated
spots), or samples in the HP-E list selected to be used in the filter.
The Filter by Spot Detection Value data submenu filter
contains options that specify spots based on their spot detection
value quality metric over the range of [0.0 : 1.0]. The filter is
available only if the data exists for your database and is ignored
otherwise. If active, it pops up a "Spot Detection Value" slider in
the range of [0.0 : 1.0]. Only spots greater than the slider value
pass the filter. This data could be the Affymetrix MAS5.0 "Detection
p-value" or some other metric correlated with spot detection quality.
The filter is enabled by setting the "Filter by per-sample Spot
Detection Value" checkbox. All spots for the specified samples must
meet the criteria. In the "Check spots for Spot Detection Value mode"
submenu, you may set the samples where the test may be applied to
spots from the current HP, the single (HP-X,HP-Y) samples, (HP-X,HP-Y)
'sets' (replicated spots), or samples in the HP-E list selected to be
used in the filter.
The Filter by spot intensity [SI1:SI2] sliders submenu contains
options that determines how individual spot intensity thresholding is
to be applied in the Filter.
The Filter by [I1:I2] sliders submenu contains options that
determines how spot expression (intensity or (Cy3/Cy5) ratio value)
thresholding is to be applied in the Filter:
The Filter by ratio or Zdiff sliders submenu contains options
that determines how spot-ratio thresholding is to be applied in the
Filter. The spot ratio is mean HP-X / mean HP-Y for sets of
samples. The spot Zdiff is used if one of the Zscore normalization
methods is active and is computed as (mean HP-X - mean HP-Y) for sets
of samples.
The Filter by Cy3/Cy5 HP-X ratio or Zdiff sliders submenu
contains options that determines how spot Cy3/Cy5 HP-X ratio
thresholding is to be applied in the Filter. The spot ratio is Cy3/Cy5
for normalized data unless one of the Zscore methods is used. In that
case, the Zdiff is used and is computed as (Cy3 - Cy5) for sets of
samples. If HP-X 'sets' is used, then it computes the mean Cy3 value
and the mean Cy5 value and uses those values in the above
computations.
The Filter by spot CV submenu filter contains options that
specify how the Coefficient Of Variation of the (F1,F2) or (HP-X,HP-Y)
'sets' (replicated spots) is to be used in the filter. The (F1,F2) CV
is available only if there are duplicate spots on the HPs.
Figure 2.4.3.1 Filtering using multiple scrollers. This example
is of Cy3/Cy5 time series data. It filters normalized spot intensity
of the Cy3 and Cy5 channels independently ([SI1:SI2] inside range)
where low intensity spots are eliminated. It then filters out genes
outside of the [R1:R2] ratio range.
Figure 2.4.3.2 Using the Positive Intensity data Filter.
This allows removing negative data if the data contains negative
intensity values (e.g. Some Affymetrix data has negative Average Difference
values which could be read as Intensity for MAExplorer).
Filtering using statistical test by your selecting a p-value
These tests will filter genes meeting
the test criteria if the resulting p-value of that test is <= the
value specified by the p-Value state slider. Only one test may be
active at a time. If you switch to a new p-value test, it will disable
the previous p-value test. If any of these tests are selected, it
will pop up the p-Value state slider window for you to set the
p-Value. There are two t-tests: one operating on duplicate (F1,F2)
data if available, and the HP-X,HP-Y 'sets' if they are defined. The
Kolmogorov-Smirnov test operates on HP-X,HP-Y 'sets' if they are
defined. The F-test operates on the current Ordered
Condition List (OCL) consisting of any number of condition lists each
containing at least 2 (replicate) samples/condition.
Filtering out genes with high replicate spot variation
The Spot CV filter mode submenu contains options to select how
the spot CV filter is to be applied. It computes the maximum value of
CV for all of the samples in the particular sample set specified. That
maximum value is then used for the spot CV filter test. Genes may be
filtered out having a large difference between spot quantification
values of corresponding duplicate spots. You may compute the
coefficient of variation CVj for the two values
(f1j and f2j for a particular
gene j.
CVj = 2|f1j-f2j|/(f1j+f2j)
If the database only has one field but replicate HPs, then you may use
the HP-X & HP-Y 'sets' CVj to filter the
genes. Then CVj values are tested against a CV
threshold slider value to eliminate genes with a high coefficient of
variation.
2.4.3.1 Data filtering using multiple gene data filters
Any or all of the data filters may be selected simultaneously. In
particular, if you select filters that use parameter threshold
scrollers, they will be added to a state scroller window (see Figure
2.3.4.1 for details to allow adjustment of ALL sliders
simultaneously). You may change various thresholds and see the effect
in real time. Note: some of the scrollers are more sensitive to low
values. Therefore, we set them to respond non-linearly with a more
precise vernier at the low end.