2.4.3 Filter menu

The final set of genes presented for display, plotting, reports, etc. is determined by a cascade of gene "data filters" that generate a restricted gene set. The cascade is computed in real-time using the
intersection of individual criteria and tests selected by the user. Examples of Filter criteria include: membership in a particular gene set, ratio (HP-X/HP-Y) within a range, passing statistical tests such as t-tests or F-test, etc.

Filter menu and Ratio filter submenu showing options and 
Preference State sliders of active filters

Figure 2.4.3 Filter menu. The Filter menu is a cascade of data filters that restrict the set of genes passing all filters that have been enabled and whatever the criteria was that was set for those filters. This figure shows the GeneClass filter set to "All genes and ESTs", the spot CV filter and Ratio (X/Y) range filters being set interactively by the scroll bars on the right. The genes that pass the filter are indicated with a red (white) circle in the array intensity (ratio) pseudoarray image.


The Filter menu options are used to restrict the set of genes by pre-filtering the data with a series of cascaded filter criteria and tests. The resulting subset of genes passing the filter are then used in the plots, reports and other data analysis methods. Some of the filters require additional parameters that are set by the State scrollers. The user will automatically be prompted for changes to these scollers (a threshold scrollers window will pop up) when the filter is activated or change. These values may also be set from the Adjust all Filter threshold scrollers entry in the Preferences submenu in the Edit menu. The filters are broken up into subgroups in the following menu with the grouping haveing more to do with the criteria (i.e. gene set membership, data range, or statistical tests).

The Filter by positive intensity data submenu filter contains options that specify which spot intensity values are to be considered when excluding negative quantified spot data. Note: this filter only makes sense if your data might have negative values (e.g. Affymetrix chip "Avg Diff" data) or a background corrected value that is less than 0.0. The filter is enabled by setting the "Filter by spots with positive intensity" checkbox. Negative intensity values may occur with some types of arrays quantification programs. In the "Check spots for positive values mode" submenu, you may set the samples where the test may be applied to spots from the current HP, the single (HP-X,HP-Y) samples, (HP-X,HP-Y) 'sets' (replicated spots), or samples in the HP-E list selected to be used in the filter. If there are (F1,F2) or (Cy3/Cy5) data, then each spot must meet the threshold criteria.

The Filter by Good Spot data submenu filter contains options that specify spots based on their quality. It filters out genes that have that do not have "Good Spot" values defined by the optional QualCheck spot data. (See the list of codes in Appendix C.4). If there is no such spot quality data, then all spots are considered "good". The filter is enabled by setting the "Filter by spots with Good Spot values" checkbox. All spots for the specified samples must meet the criteria. In the "Check spots for Good Spot mode" submenu, you may set the samples where the test may be applied to spots from the current HP, the single (HP-X,HP-Y) samples, (HP-X,HP-Y) 'sets' (replicated spots), or samples in the HP-E list selected to be used in the filter.

The Filter by Spot Detection Value data submenu filter contains options that specify spots based on their spot detection value quality metric over the range of [0.0 : 1.0]. The filter is available only if the data exists for your database and is ignored otherwise. If active, it pops up a "Spot Detection Value" slider in the range of [0.0 : 1.0]. Only spots greater than the slider value pass the filter. This data could be the Affymetrix MAS5.0 "Detection p-value" or some other metric correlated with spot detection quality. The filter is enabled by setting the "Filter by per-sample Spot Detection Value" checkbox. All spots for the specified samples must meet the criteria. In the "Check spots for Spot Detection Value mode" submenu, you may set the samples where the test may be applied to spots from the current HP, the single (HP-X,HP-Y) samples, (HP-X,HP-Y) 'sets' (replicated spots), or samples in the HP-E list selected to be used in the filter.

The Filter by spot intensity [SI1:SI2] sliders submenu contains options that determines how individual spot intensity thresholding is to be applied in the Filter.

The Use data mode submenu filter contains options that specify which spot intensity values are to be considered of the single sample (F1 and F2 replicated spot intensity data, or Cy3/Cy5 for ratio data), or the (HP-X,HP-Y) 'sets' of replicated samples is to be used in the filter. If there are single sample (F1,F2) or (Cy3/Cy5) data, then each spot must meet the threshold criteria.

The Compare channels meeting range submenu specifies which additional constraints are to be used. If required by the (AT MOST channels, AT LEAST channels, PRODUCT OF channels, SUM OF channels) commands, the Percent SI OK scroll bar will appear which covers the range of 0% to 100%.

The Filter by [I1:I2] sliders submenu contains options that determines how spot expression (intensity or (Cy3/Cy5) ratio value) thresholding is to be applied in the Filter:

The Filter by ratio or Zdiff sliders submenu contains options that determines how spot-ratio thresholding is to be applied in the Filter. The spot ratio is mean HP-X / mean HP-Y for sets of samples. The spot Zdiff is used if one of the Zscore normalization methods is active and is computed as (mean HP-X - mean HP-Y) for sets of samples.

The Filter by Cy3/Cy5 HP-X ratio or Zdiff sliders submenu contains options that determines how spot Cy3/Cy5 HP-X ratio thresholding is to be applied in the Filter. The spot ratio is Cy3/Cy5 for normalized data unless one of the Zscore methods is used. In that case, the Zdiff is used and is computed as (Cy3 - Cy5) for sets of samples. If HP-X 'sets' is used, then it computes the mean Cy3 value and the mean Cy5 value and uses those values in the above computations.

The Filter by spot CV submenu filter contains options that specify how the Coefficient Of Variation of the (F1,F2) or (HP-X,HP-Y) 'sets' (replicated spots) is to be used in the filter. The (F1,F2) CV is available only if there are duplicate spots on the HPs.

Filtering using statistical test by your selecting a p-value

These tests will filter genes meeting the test criteria if the resulting p-value of that test is <= the value specified by the p-Value state slider. Only one test may be active at a time. If you switch to a new p-value test, it will disable the previous p-value test. If any of these tests are selected, it will pop up the p-Value state slider window for you to set the p-Value. There are two t-tests: one operating on duplicate (F1,F2) data if available, and the HP-X,HP-Y 'sets' if they are defined. The Kolmogorov-Smirnov test operates on HP-X,HP-Y 'sets' if they are defined. The F-test operates on the current Ordered Condition List (OCL) consisting of any number of condition lists each containing at least 2 (replicate) samples/condition.
  • Filter by current Ordered Condition List (OCL) F-Test [p-Value] slider [RB] - only include genes that meet the F-test criteria on the current OCL. This only works if there are at least 2 (replicate) samples/condition for each of the condition sets in the OCL. See info on defining the OCL and using the OCL data.

    Filtering out genes with high replicate spot variation

    The Spot CV filter mode submenu contains options to select how the spot CV filter is to be applied. It computes the maximum value of CV for all of the samples in the particular sample set specified. That maximum value is then used for the spot CV filter test. Genes may be filtered out having a large difference between spot quantification values of corresponding duplicate spots. You may compute the coefficient of variation CVj for the two values (f1j and f2j for a particular gene j.
        CVj = 2|f1j-f2j|/(f1j+f2j)
    
    If the database only has one field but replicate HPs, then you may use the HP-X & HP-Y 'sets' CVj to filter the genes. Then CVj values are tested against a CV threshold slider value to eliminate genes with a high coefficient of variation.

    2.4.3.1 Data filtering using multiple gene data filters

    Any or all of the data filters may be selected simultaneously. In particular, if you select filters that use parameter threshold scrollers, they will be added to a state scroller window (see Figure 2.3.4.1 for details to allow adjustment of ALL sliders simultaneously). You may change various thresholds and see the effect in real time. Note: some of the scrollers are more sensitive to low values. Therefore, we set them to respond non-linearly with a more precise vernier at the low end.

    Filtering using multiple filters (Spot Intensity and Ratios of samples) 
and multiple thresholds

    Figure 2.4.3.1 Filtering using multiple scrollers. This example is of Cy3/Cy5 time series data. It filters normalized spot intensity of the Cy3 and Cy5 channels independently ([SI1:SI2] inside range) where low intensity spots are eliminated. It then filters out genes outside of the [R1:R2] ratio range.

    Filtering using positive intensity data - ignoring negative data

    Figure 2.4.3.2 Using the Positive Intensity data Filter. This allows removing negative data if the data contains negative intensity values (e.g. Some Affymetrix data has negative Average Difference values which could be read as Intensity for MAExplorer).