MAExplorer - Microarray Exploratory Data Analysis

Appendix B. Advanced tutorial for MAExplorer

There are a number of things you may do in this facility. We wrote this advanced tutorial to help demonstrate some of its capabilities. A short tutorial (Appendix A) is also available and we recommend doing it before attempting the advanced tutorial. Sources of startup data to use with the tutorials are listed in the short tutorial. As with all tutorials, they are only starting points for getting you into the analysis environment - try out new options on your own, you can't break anything :-).

Analyze expression of individual genes
Analyze expression of gene families and clusters
Compare expression patterns in multiple hybridized samples

NOTE: THIS APPENDIX IS BEING REVISED...

Here are some things to try

You could click on the Reference manual in the Help menu. That is this reference manual, but it appears in a new Netscape Web browser window so you may read it while working on MAExplorer. Delete the window when you are finished with it.
Click on the Glossary in the MAExplorer Help menu. This section describes terms used in MAExplorer. Delete the window when you are finished with it.
Click on the Index. It may be useful in finding things that are not in the table of contents.
Note: a hybridized sample is a microarray hybridized with cDNA derived from the mRNA from a particular experiment sample (see notation defined in the Overview). Currently, you may start MAExplorer with preloaded set of samples by starting the stand-alone MAExplorer on a .mae file. Alternatively, you may start it with no samples loaded. It the latter case, you would load samples you are interested in from the Samples menu.

When first started, it loads some initial data it needs as well as the particular hybridized samples you specified. After MAExplorer starts, it displays "Ready - click on a gene to query database" and the menus becomes active. Here are some things to try.

Click on different genes (i.e. spots in the pseudoarray image). The pseudoarray image may or may not correspond to the actual array - depending on how the data was derived. Notice the data that gets displayed in the three text lines above the image. If the spot you click on is a named gene (e.g. [1-A4,3] at row 4 column 3 in Field 1), it will also print out the GeneName of the gene.
Look at the pull-down menus. They consist of sets of commands with similar functionality grouped together in sub-menus. In particular, look at the Analysis menu. It contains an ordered list of submenus that may be thought of as the sequence one might perform an analysis. In reality, an analysis is more complicated and involves iterating various steps (see Figure 3.1).
Go to the "Samples" menu. This menu allows selecting hybridized samples for the HP-X, HP-Y, HP-X and HP-Y 'sets', and the HP-E list. You have several ways to do this. The easiest way is to use the "Choose HP-X, HP-Y and HP-E" sample selection wizard. The other alternative is to use one of the set of cascading menus that may be used to change the selected hybridized samples. Note that the HP-X and HP-Y submenus assign samples for subsequent analysis such as X-Y scatter plots. The HP-E submenu sets the list of samples for expression profiles. Note how you may assign a sample either by going through the cascading menus or from an alphabetic list of all hybridized samples in the pseudoarray image. To change the default HP-X (HP-Y) sample, click on the purple "[X]" ("[Y]") box on the left side of the image (above the list of samples) so it is selected. Then click on the desired purple "*" adjacent to the samples listed on the left edge of the pseudoarray image. You may switch between using single and multiple samples (i.e. 'sets') with HP-X and HP-Y. Note the "HP-X:" and "HP-Y:" labels at top left when you switch between single and multiple samples. Go to the HP-X/-Y 'set' and HP-E submenus and list the contents of the respective sets to see what they contain. Note how one may add or remove samples from these sets.
Go to the "Samples" menu. Then select "Choose named condition list of samples". This lets you define new condition lists of samples. Go to the "Edit" menu then "Sets of Conditions (samples)" to see additional ways to manipulate these condition sets. For example, you might define a new condition and the assign it to the working HP-X 'set'.
Go to the "Samples" menu. Then select "Choose ordered list of conditions". This lets you define a new or edit an old Ordered Condition List(OCL). In the included MGAP there, there is a pre-computed example of an Ordered Condition List using 4 conditions of replicates of C57B6 (pregnancy day 13, lactation days 1 and 10, and stat5a(-,-) 15 samples. The database also includes 4 additional condition sets of this data and an Ordered Condition List of the 4 conditions (in the State/ directory). This may be used to demo the OCL F-test filter. Go to the " Filter" menu and select "Filter by current Ordered Condition List (OCL) F-test [p-Value]". This will popup a p-value slider where you can adjust the criteria for selecting genes passing the F-test.
Go to the "GeneClass" submenu in the "Analysis" menu. This is set of cascading menus that may be used to change the default Gene Class. Different genes belong to different gene classes and this is a way of sub-setting the data. You may currently set it to All Genes, All Named Genes, ESTs similar to genes, ESTs, Good genes, All named genes and ESTs, Replicate genes (multiple copies on the array), Calibration DNA. Select "ESTs similar to genes". Notice that the display now only shows the red (white) circles on the ESTs similar to genes in the intensity (ratio) pseudoarray image. Look at the circle overlays on the microarray image - note how they changed. Go back and set the gene class to "All genes" and then "Calibration DNA". The "Calibration DNA" is the set of spots that may be used for normalizing data between microarrays. Check out the "Set gene class subset" submenu. Leave the gene class set to All Genes.
If you are connected to the Internet, go to the "View" Menu. Then turn on the switch "Enable display current gene in popup XXXX Web Browser". Depending on your database, XXXX may be GenBank, LocusID, UniGene, dbEST or mAdb Clone DB. Then click on a gene in the image. It pops up a Web browser showing the genomic database Web page for that gene (if any). If you click on a different spot, it will reuse the popup Web browser with new data. If you don't want it to be active, go back to the "View Menu" and click on "Enable display current gene ..." again to disable it.
Go to the "Normalization" submenu in the "Analysis" menu. This is used for normalizing data between microarrays so they may be compared. The default is to scale the data to the median value for each array. Investigate the other normalization methods. If your quantified data contains spot background data, you may also enable background correction that subtracts a microarray specific overall background intensity value from each intensity value in each array.
Go to the "Report" submenu in the "Analysis" menu. You may generate a report in several formats "Spreadsheet" or "tab-delimited", the latter being cut and paste compatible with Excel. In the "Samples Report" sub-submenu in the "Report" submenu, click on "Hybridized Samples", then switch to the other Report format and do it again. Click on "Hybridized Sample Web links". In the "Gene Reports" submenu, try "All named genes". Then try the "Highest HP-X/HP-Y ratio" in the "Filtered gene reports" submenu. If you are using a list of HP-E or HP-X/-Y 'sets' of samples, you might try looking at the expression profile ratios or statistical data respectively though other Reports.
Review the data " Filter" submenu in the "Analysis" menu. Select the "Filter by expression sliders" [I1:I2]. Expression is intensity in the case of ³³P or biotin labeled, or (Cy3/Cy5) in the case of ratio data. This pops up a state sliders window. The I1 scroll bar (lower limit of the gray value intensity) to about 100. Look at the genes that were eliminated because they are out of the range. If you select the "Filter by spot intensity sliders" [SI1:SI2], it will filter genes by spot intensity in either F1 or F2 duplicate (if you have duplicate spots), or the Cy3 or Cy5 spots. You can select which sets of HPs to use (current HP, HP-X and HP-Y, HP-XY sets, or HP-E). You might disable the [I1:I2] filter while you experiment with [Si1:SI2] filter. If you have (F1,F2) or (Cy3,Cy5) data, put up the F1 vs F2 or Cy3 vs Cy5 scatter plot and you can visualize how the thresholding works.
In the Filter menu, add the "Filter by ratio or Zdiff sliders". Then the [R1:R2] ratio range sliders are added to the state slider window and may be used for filtering genes. If the normalization method is one of the Zscore methods, it filters by the difference of the Zscores otherwise by the ratio and the [Z1:Z2] range is used. Note that the genes that pass the filter will appear to have a red (white) circle in the pseudoarray intensity (ratio) grayscale (pseudocolor), or red "+" in the scatter plots so you might try moving the controls while in those plot modes. Try some of the other filters. The spot CV test removes genes where replicate spot values (F1 and F2 in the case of a single sample or replicate samples in the case of HP-X and HY-Y 'sets' or the HP-E' list of genes) are not well correlated. The t-Test filter may be used with sets of X and Y samples to find genes with a p-value less than the specified threshold.
Go to the "Plot" submenu in the "Analysis" menu. Then got to the "Show Microarray" submenu, try pseudocolor ratio (or Zscore) modes and finally leave it in the pseudograyscale image intensity mode. Try using the "Scatter Plots" and "Histograms" in the Plot menu. Vary the normalization methods and see how it affects the array image and scatter plots. When you are in a scatter plot, click on a point. It will display data similar to when clicking on an image. If UniGene data is available in your database, Go to the Views menu and set "Enable display current gene in popup UniGene Web browser" and click on a point again. This pops up another Web browser window and lookup that gene in the UniGene database. Change the threshold slider and notice how points appear and disappear. Click on a bin in the ratio histogram, it will filter the genes so that only the genes that have ratios in that bin are displayed.
Go to the "Plot" submenu in the "Analysis" Menu and then to the "Expression profile" submenu. Then select the "Display gene expr. profile for HP-E". It pops up a window with two buttons "Show HP names" and "Close". Then click on a gene in the image. It will draw the expression profile for that gene. Move the mouse so it is over one of the vertical bars in the plot to get the data for that particular HP. If you click on a different spot in the image it will display the new expression profile in the same window. Click on the "Show HP names" button to popup a window with the list of HPs matching the numbers in the expression profile plot. Now create a scatter plot of two HPs and then click on a red "+" in the plot. It will update the expression profile plot with this gene. If you want to compare several expression profile plots, repeated create the "Display gene expression..." windows from the View menu. Move the windows close to each other so they are easier to compare. Only the last one you created allows you to change the gene. If you don't want a popup window anymore, click on its "Close" button. You may save the scatter plot as a GIF file by pressing the "SaveAs" button which will save it in the Report subdirectory associated with the startup data..
Next we look at gene clustering. Go to the "Plot" submenu in the "Analysis" menu, and then to the "Cluster plots" submenu, try "Cluster genes with expression profiles similar to current gene". This pops up a slider with the cluster threshold. Then click on a gene in the image. It will then popup a text window with the genes whose cluster distance is less than the cluster threshold. It is sorted by minimum cluster distance. Notice that some genes in the image have different size blue boxes around them. The larger the box, the smaller the cluster distance and the more similar. Move the cluster threshold slider. This will change the clustering as seen in both the image and in the cluster popup window. Click on the "Report" button in the text window. This will popup a report on these genes sorted by minimum cluster distance. Then click on the "EP plot". This pops up a scrollable list of expression profile plots for the genes you have filtered by this test on so you can review the actual expression profiles. Close this window and set the Filter to a small number of genes such as with "ESTs similar to genes". Note that the genes passing this filter are saved in the E.C.L. and may be saved as a gene subset or part of the data Filter.
Turn on one or more Filters to reduce the number of genes to say under 100 (e.g. t-test or spot CV filters). Then press the "Go 'Cluster all genes'" button in the cluster window. This is equivalent to invoking the "Cluster counts of Filtered genes by expression profiles" command from the "Cluster plots" submenu. Notice the Filtered genes has blue circles of different sizes. The larger the circle, the more genes there are that are similar to that gene. Move the cluster threshold slider and note that the number of similar genes changes, the size of the blue circles will change. As with the other cluster mode, you may generate a report of sorted cluster counts. Click on a gene with the largest green circle. This will then switch you back to single gene clustering mode where you can investigate that gene in more detail.
Next we look at K-means gene clustering. Go to the "Plot" menu and in the "Cluster plots" submenu, try 'Display K-means gene expression profiles for Filtered HP-E' (similar to K-means clustering). This pops up a scroller for "N-clusters", the number of clusters desired, with a default of 6 clusters. It will then popup a K-means report window with various controls. The centers of the N clusters are indicated with magenta circles where the size corresponds to the number of genes in the cluster. If you click on a gene in the array, it will draw all members of its cluster as green numbers (try this with the scatter plot present as well). Press "EP plot" to scroll through a list of expression profiles of the genes sorted by cluster. Press "Mean EP plot" to scroll through the summary expression profiles of the clusters. Press "Cluster-Report" and "Mn-Cluster-Report" to generate reports for these clusters. You may change the seed gene by changing the current gene (click on a different gene in either the array image or the EP plot. Then and press "Recompute" to recompute the clusters. Pressing the "Show HP names" pops up a list of the samples used in the expression profile. Now add a HP-X vs HP-Y scatter plot. If you select a particular gene, it puts all genes associated with the cluster for that gene in the "Current Cluster" and colors them with a green cluster number instead of a "+".
Next we look at hierarchical gene clustering. Go to the "Plot" menu and in the "Cluster plots" submenu, try "Hierarchical clustering of expression profiles". Then select "Display clustergram of gene expr profiles". This will compute the clustergram for the Filtered genes with the data normalized by the assigned HP-X sample. It will then popup a clustergram window with various controls. The clustergram is scrollable. You may optionally add the dendrogram with the "Dendrogram" checkbox. Clicking on a row will show data for that gene. Clicking on a box in the clustergram will show data for the particular sample for that gene. You may zoom the dendrogram by repeatedly clicking on the "xxxX DB" zoom button. Press "EP plot" to scroll through a list of expression profiles of the genes sorted the same as the clustergram. Press "ClusterGram Report" to generate a report of the expression profile data sorted the same as the clustergram. Pressing the "Show HP names" pops up a list of the samples used in the expression profile. You may save the entire clustergram - dendrogram image in a GIF file as before by pressing the "SaveAs" button.
You may perform operations on sets of genes. For example, merge sets of genes found under two different experiments or conditions. Go to the "Sets of genes" submenu in the Edit menu and pick the "List saved gene sets" selection. This lists the default gene sets. Then select "Assign 'User Filter Gene Set'". This will request a gene set to use with the Filter in a pop up dialog box. Select the set for "Genes in current cluster class" that you saved in the previous example. Then press "Ok" in the dialog box. Then select "All genes" in the GeneClass menu. This resets the filter to look at all genes. Select "Filter by 'User Gene Set' membership" in the Filter menu. This restricts the genes to the saved current cluster in the previous example.
Gene set operations may be performed on pairs of gene sets. Select "Union of 2 gene sets" entry from the "Sets of genes" submenu in the Edit menu. This will request 3 gene set names in a pop up dialog box. Select 'Similar ESTs' for the 1st gene set name, select "Genes in current cluster class" for the 2nd gene set name, Enter "Union of similar ESTs and genes in current cluster" for new gene set name. Then press "Ok" in the dialog box. This computes the union of the two gene sets into a new gene set. Then select "Filter by 'User Gene Set' membership" as before. This will reset the 'Use Gene Set' for the Filter in a pop up dialog box. Select the set for "Union of similar ESTs and genes in current cluster" just saved. Try saving other Filtered genes sets and doing other gene set operations.
Go to the "Help" menu. The MAExplorer documentation and glossary as well as other MGAP documents are available and will appear in a new popup Web browser.
If you are running MAExplorer as a stand-alone application, you may save the state of your analysis (but not the gene subsets at this time). Go to the "File" menu and then "Databases" submenu. Select "Save as file DB". Enter a file name to save your startup state. Then you can restart MAExplorer on this data set at a latter time by clicking on this file (in a windows based system). Saving the database will also save the state of the data Filter, the gene sets and the condition lists. You can see this by listing them in the Edit menu sub-options after you have restarted a previously saved database.