MAExplorer - Microarray Exploratory Data Analysis

2.4.2 Normalization menu

The Normalization menu operations include operations to normalize gene intensity data between hybridized samples. This is critical in being able to compare samples because of differences in amount of sample, labeling efficiency and variations in scanner operation including gain and baseline settings. There are several methods available including normalizing by Zscore, median, log mean, Zscore of logs, calibration DNA, housekeeping genes, etc. The specific microarray image quantification is determined the image analysis program being used to pre-process the arrays.

Note: although this set of normalization methods is limited, it is adequate for some analyses of the data. We are in the process of adding more normalization methods through MAEPlugin methods.

Zscore of intensity [RB] - normalize by the (intensity-mean)/stdDev of raw intensities for all spots in each sample.
Median intensity [RB] - normalize by the median of the raw intensities for all spots in each sample (the default normalization).
Log median intensity [RB] - normalize by the log of median scaled raw intensities for all spots in each sample.
Zscore log intensity, stdDev [RB] - normalize by the Zscore of the log intensity using (log(intensity)-mean_log)/stdDev_log, standard deviation for all spots in each sample.
Zscore log intensity, mnAbsDev [RB] - normalize by the Zscore of the log intensity using (log(intensity)-mean_log)/meanAbsDev_log, mean absolute deviation for all spots in each sample.
By Calibration DNA set of genes [RB] - normalize by the sum of the 'Calibration DNA' genes for each sample (if it exists in your database).
By 'User Normalization Gene Set' [RB] - normalize by the sum of the genes in a user defined gene set in each sample. You assign this gene set using the (Edit menu | Gene sets | Assign 'User Normalization Gene Set') operation.
By housekeeping gene set [RB] - normalize each HP data set by the sum of the intensity values for known housekeeping genes in each sample (if it exists in your database).
Scale intensity data to 65K [RB] - scale the data for the microarray by 65535/maxIntensity for each sample.
Unnormalized [RB] - do not scale data between samples. I.e. use the raw data.
----------------------
Use background intensity correction [CB] - enable/disable background correction to gene intensity measurements.
Use ratio median intensity correction [CB] - enable/disable ratio median correction to clone intensity measurements by multiplying the ratio (Cy3/Cy5) by medianCy5/medianCy3 intensies. If background correction is enabled, correct by (medianCy5-medianBkgdCy5)/(medianCy3-medianBkgdCy3).

2.4.2.1 Intensity background correction

The background intensity data from the spot quantification programs may be used to correct spot intensity. Background may be specified as either a global value or on a per-spot basis. If the array images have low background, then this may not be too much of a problem if no background values are available.

Some software quantification software (e.g. Research Genetics' Pathways 2.01) measures background globally as: BGLow (low background), BGAvg (Average background), BGRms (root mean square background). For MGAP, MAExplorer uses the BGLow value when you request background subtraction. These values are read from the MAExplorer Samples DB file (see Appendix Table C.2.1.1 For other quantification programs, background may be available on a per-spot basis in the quantification files. It the latter is available in your data, it will be used if background correction is enabled (see Appendix C.3).

The background corrected intensity I'_ij is computed from the raw intensity I_ij and background intensity bkgrd_HPi for H.P. i and spot j as follows:

      I'_ij = I_j - bkgrd_HPi

Ratio computation for Cy3 and Cy5 data

For most MAExplorer operations, the intensity of a gene is generally computed as the mean intensity of the spots (background corrected or not) which duplicate that gene on the microarray. When working with dual hybridized samples using Cye-3 and Cye-5-dUTP labeling that results in green and red fluorescence, this can be used in self-normalizing intensity for each hybridized clone array using the Cy3/Cy5 ratio. If local background is available, then the ratio can be computed for HP h and spot j as

  (Cy3_hj - BkgrdCy3_hj) / (Cy5_hj - BkgrdCy5_hj)

2.4.2.2 Normalization between microarrays to allow comparison

The normalization of quantitative data is crucial when comparing data between different microarray samples. There are a number of different schemes possible. One is to normalize by the sum of known calibration, housekeeping genes or other "constant expression" genes in the microarray. Another is to sum the background corrected integrated density for all spots in an array and to normalize individual gene measurements by that sum. These methods are now described in more detail. As the MAEPlugins facility becomes available, we will be adding a number of more sophisticated gene-specific normalization methods that take many of the problems specific to microarrays into account.

Normalizing by scaled Zscore of intensity

The "normalized Zscore of intensity" method normalizes each hybridized sample by the mean and standard deviation of the raw intensities for all of the spots in that sample. The mean intensity mnI_i and the standard deviation sdI_i are computed for the raw intensity of 'Good genes'. It is useful for standardizing the mean (to 0.0) and the range of data between hybridized samples to about -4.0 to +4.0. When using the Zscore, you compute Zdiff(erences) not ratios. The Zscore intensity Zscore_ij for intensity I_ij for HP i and spot j is computed as

     Zscore_ij = (I_ij - mnI_i)/sdI_i,
and
     Zdiff_j(x,y) = Zscore_xj - Zscore_yj.

Normalizing by the median of intensity

The "Median intensity" method normalizes each hybridized sample by the median of the raw intensities of 'Good genes' for all of the spots in that sample. It is a useful normalization to use when you want to compute X/Y ratios between hybridized samples.

     Im_ij = (I_ij/ medianI_i)

Normalizing by the log of median of intensity

The "Log median intensity" method normalizes each hybridized sample by the log of median scaled raw intensities of 'Good genes' for all of the spots in that sample. The value 1.0 is added to the intensity value to avoid taking the log(0.0) when intensity has zero value. This is a useful normalization to use when you want to compute X/Y ratios between hybridized samples and compress the scale. Because we are computing a log, we report the difference between HP-X and HP-Y as (X-Y) instead of a ratio (X/Y).

     Im_ij = log(1.0 + (I_ij/ medianI_i))

Normalizing by scaled Zscore of log intensity, standard deviation

The "Normalize by Zscore of log intensity, stdDev" method normalizes each hybridized sample by the mean and standard deviation of the logs of the raw intensities for all of the spots in that sample. The mean log intensity mnLI_i and the standard deviation log intensity sdLI_i are computed for the log of raw intensity of 'Good genes'. Then the Zscore intensity ZlogS_ij for HP i and spot j is

     ZlogS_ij = (log(I_ij) - mnLI_i)/sdLI_i

Normalizing by scaled Zscore mean absolute deviation of log intensity

The "Normalize by Zscore of log intensity, mean absolute deviation" method normalizes each hybridized sample by the mean and mean absolute deviation of the logs of the raw intensities for all of the spots in that sample. The mean log intensity mnLI_i and the mean absolute deviation log intensity madLI_i are computed for the log of raw intensity of 'Good genes'. Then the Zscore intensity ZlogA_ij for HP i and spot j is

     ZlogA_ij = (log(I_ij) - mnLI_i)/madLI_i

By 'User Normalization Gene Set'

This method is useful a subset of genes have been determined to have relatively constant expression across the set of samples. It normalizes by the sum of intensities for a subset of genes defined by the user in the 'User Normalization Gene Set' (Section 2.3.2)using the gene set editing commands. Normalizing by the sum of genes uses the Igs_ij that is computed for microarray HPi with intensities I_ij for all genes j in the gene subset.

     Igs_i = Sum (I_ij)
        	   _{genes j}
        	   _{i in HPi}

Then, the normalized intensity I'_ij is computed as:

      I'_ij =  I_ij/Igs_i

By 'Calibration DNA' set

If a predefined set of calibration DNA genes are available on the array, they may be used to normalize density values between the samples. The calibration DNA genes are defined by special gene names that are declared in the Configuration file using the 'calibDNAname' parameter (see Appendix C Table C.5.1(C)). If there is no calibration DNA, this entry is not used. The algorithm is the same as "User Normalization Gene Set" (above), but the set is predefined as the genes flagged as calibration DNA. For example, in the MGAP database, these spots are the "mouse genomic DNA" spots so the Configuration file entry would be calibDNAname="m.g. DNA".

Figure 2.4.2.3 Scatter plot of HP-X and HP-Y 'sets' data. HP-X is C57B6 pregnancy day 13 and HP-Y is Stat5a (-,-) pregnancy day 13 filtered by "All named genes and ESTs". A) A scatter plot using the Median normalization. B) A scatter plot using the Zscore of the logs normalization. Notice how the Casein alpha outlier is more apparent in the case of the Zscore log normalization. The skewed plot is characteristic of much microarray data. Some normalization methods (not currently included in MAExplorer) can compensate for these some of these artifacts (Dutoit, 2000) and are planned for future MAEPlugins.