Appendix A. Short tutorial for MAExplorer

This tutorial is for use with MAExplorer, an exploratory data analysis facility for microarray DNA databases. It may be used with any MAExplorer database. As with all tutorials, they are only starting points for getting you started - in this case into understanding the data mining analysis environment. Try out new options on your own, you can't break anything :-).

This tutorial lets you

  1. Analyze expression of individual genes
  2. Analyze expression of gene families and clusters
  3. Compare expression patterns in multiple hybridized samples
NOTE: THIS APPENDIX IS BEING REVISED AND EXPANDED...

A.1 Demonstration data

Note that the downloadable MAExplorer stand-alone application includes a subset of 50 hybridized samples from the MGAP database including a number of startup files for that data (see the
the list of startup .mae files included in the download installation).

There is also a pre-computed example of an Ordered Condition List using 4 conditions of replicates of C57B6 (pregnancy day 13, lactation days 1 and 10, and stat5a(-,-) 15 samples. The database also includes 4 additional condition sets of this data and an Ordered Condition List of the 4 conditions (in the State/ directory). This may be used to demo the OCL F-test filter.

If you have access to another MAExplorer database, you can use it instead since the tutorials are fairly generic.

Using the stand-alone application for the tutorial

These same subsets as well as other subsets of the MGAP data are available in the set of .mae startup files distributed with MAExplorer. To access these files,

  1. Start MAExplorer after you have installed it. Eg. in Windows, go to the Windows "Start Menu" and click on MAExplorer. If it is not in your Start Menu, you can go to where you installed it (typically C:\Program Files\MAExplorer) and click on MAExplorer.exe.
  2. Then after it starts, go to the "Files" menu and select "Open disk DB" and select the startup file you want. Alternatively, you can go directly to the list of startup files in C:\Program Files\MAExplorer\MAE) and double-click on one of the startup files.

A.2 General instructions:

Throughout this tutorial we refer to condition X and condition Y. These are different hybridized samples in the particular database you have loaded. For example, in the MGAP database X might be lactation and Y might be pregnancy. X and Y 'sets' are multiple samples of these two conditions.

First, select one of the start up databases.

If the particular samples you want to analyze are not listed in that example, after it starts you will be able to add samples you do want and remove samples you don't want - regardless of which example was intially used if the database "Samples" database contains additional hybridized samples.

When it starts, a main window will pop up. It then downloads a gene database tables and the particular hybridized samples you specified. When it is ready for you to begin interaction, the menu bar will become active and it will display a green Ready - click on a gene to query database message. Depending on your Internet connection speed, it may take a few minutes to set up. If you are running MAExplorer as a stand-alone application and it is getting data from your local disk, startup will be much faster.

Second, go to the A.3 instructions for self-guided tutorial below for instructions on what to do next.

HINT: print this tutorial page and then read the following instructions from the printout rather than trying to keep this window visible. You might also print the parts of the MAExplorer Reference Manual for the same reason.

HINT: You might want to keep a record of the commands you have used or the messages and measurements you have made. To do this you need to enable message and command history logging. Go to the View pull-down menu and then select the type of logging you want using the Show log of messages or the Show log of command history commands.

NOTES:. On computers with low resolution (i.e. less than 1024 X 780) you may need to resize the windows and move them to different parts of the screen to view them simultaneously.


A.3 Self-guided tutorial of MAExplorer - notation and examples

The following is a self-guided tutorial (you issue the commands) that illustrates some of the data analysis capabilities. In the following examples, the notation "go to A:B:C" means go pull-down menu A, then submenu B and, then make selection C. "Selecting a gene" from the microarray image or scatter plot means clicking on a spot in the pseudoarray image or a point in the any of the plots.


A.3.1 Review of types of gene data available in the database

step 1: go to Analysis: GeneClass: All genes
           the array shows all genes with white circles.
step 2: go to Analysis: GeneClass: All named genes
           the array shows named genes with white circles.
step 3: go to Analysis: GeneClass: ESTs similar to genes
           the array shows ESTs similar to named genes with white circles.
step 4: go to Analysis: GeneClass: ESTs
           the array shows unknown ESTs with white circles.
step 5: go to Analysis: GeneClass: All genes and ESTs
           the array shows all named genes and all ESTs with white circles.
step 6: go to Analysis: GeneClass: Replicate genes
           the array shows replicate genes having at least 2 copies in the
           array with white circles.
step 7: go to Analysis: GeneClass: Calibration DNA
           the array shows calibration DNA (if present) with white circles.
step 8: go to Analysis: GeneClass: Your plates
           the array shows clones from user's plates (if present) with white circles.


A.3.1.1 Analysis of the expression of a single known gene

  1. ratio between two conditions X and Y (HP-X, HP-Y)
  2. expression profile of a set of conditions (HP-E) (see Example A.3.1.7)
step 1: click on the blue "Enter gene name" button to pop up a name entry window
step 2: start typing gene name into blue text entry window
step 3: once gene names appear, click on gene of choice
step 4: press "Done" button in pop up window
           A yellow circle will define the gene as the "current gene" in the microarray
           pseudoarray image (info on gene is also provided in the status area above the array).
           If there are replicate grids (left and right fields of repeated genes are denoted
           by F1 and F2) in the array (HP). The mean(HP-X,HP-Y) values and the (HP-X/HP-Y)
           values for the specified gene are reported are reported.
step 5: alternatively, click on an array spot of choice to define any gene
           in the array as the new current gene


A.3.1.2 Find a subset of genes with a common substring (e.g. *ONCO*)

step 1: click on the blue "Enter gene name" button to pop up a name entry window
step 2: start typing "*ONCO*" (without the quotes) into blue text entry window
step 3: once gene names appear, press "Set E.G.L." button in pop up window
           Magenta squares will indicate these genes in the pseudoarray image.
           These include the 'onco'genes and the proto-'onco'genes

A.3.1.3 Two conditions - scatter plots:

Create a scatter plot of two hybridized samples where condition X data is on the X axis and condition Y data on the Y axis.

step 1: go to Analysis: Plot: Scatter plots: HP-X vs. HP-Y.
           then click on yellow circle in scatter plot to get HP-X/HP-Y ratio for the gene
step 2: click on any point in the scatter plot
           this also alternatively defines any gene in the plot as the new current gene
step 3: zoom in on a region of the plot using the vertical or horizontal scroll bars
step 4: click on another point in the scatter plot to get the HP-X/HP-Y ratio another gene
step 5: press "Close" button to remove pop up window


A.3.1.4 Scatter plot of Cy3 vs Cy5 or replicate spots (F1 vs F2) of one sample

Create a scatter plot of Cy3 vs Cy5 channels or replicate spot F1, F2 data if your database is contains (Cy3,Cy5) ratio data or it contains replicate spot fields (F1,F2).

step 1: go to Analysis: Plot: Scatter plots: Cy3 vs. Cy5
           or go to Analysis: Plot: Scatter plots: F1 vs. F2
           Then, click on green circle in scatter plot to get Cy3/CY5 ratio for the gene
           or F1/F2 ratio for replicate spots for that gene
step 2: click on any point in the scatter plot
           this also alternatively defines any gene in the plot as the new current gene
step 3: zoom in on a region of the plot using the vertical or horizontal scroll bars
step 4: click on another point in the scatter plot to get the HP-X/HP-Y ratio another gene

If you are working with Cy3/Cy5 dye-swap data, you may swap the Cy3/Cy5 channel data to Cy5/Cy3 for any selected subset of samples. This may make it easier to use the data in various ways when data mining. If you do not have this type of data, go to step 7.

step 5': go to Samples: Edit (Cy5/Cy3) else use (Cy3/Cy5) menu
step 6': select the samples you wish to swap and press "Done". This
           enables you to see the swapped results in the scatter plot
step 7: press "Close" button to remove pop up window


A.3.1.5 Filter by expression ratio between two conditions X and Y

step 1: go to Analysis: Plot: Histograms: HP-X/HP-Y
           the histogram shows the ratios
step 2: move pop up plot so you can see it and the array simultaneously
step 3: choose (click on) a ratio bin
           genes filtered by the ratio range of the bin will light up on the array ('+'s)
step 4: click on different bin in the histogram to select another bin
step 5: click on word "Freq" on left in histogram to remove the histogram bin filter

Note of caution: if the signal is close to background the X/Y ratio may be bogus.
You can filter out low intensity genes by


A.3.1.6 Filter by spot intensity range

step 1: go to Analysis: Filter: Filter by spot intensity [SI1:SI2] sliders: Use spot intensity [SI1:SI2] sliders
step 2: adjust intensity lower bound (SI1) to remove low ratio genes
step 3: when done, remove the 'Filter by intensity sliders' by toggling it off (redo step 1 to toggle it off)
step 4: repeat steps 1-3, but this time use Filter : Filter by [I1:I2] sliders :
           Use spot intensity (or Cy3/Cy5) [I1:I2] sliders

A.3.1.7 Multiple conditions - expression profile plots of HP-E data:

step 1: go to Analysis: Plot: Expression profile: Display a gene's expression profile
step 2: after the expression profile window pops up, click on a gene in array to see its profile
step 3: click on a line in the profile plot to see its intensity
step 4: click on a different gene in the array to see its profile
step 5: press "Show HPs" button to see the list of samples used
step 6: press "Close" button to remove pop up windows



A.3.2 Changing the normalization between hybridized samples

You may change the normalization method used to scale data between hybridized samples so they may be compared.

A.3.2.1 Set normalization

step 1: go to Analysis: Normalization: Median intensity
step 2: go to Analysis: Plot: Scatter plots: HP-X vs. HP-Y
           to see the effect of normalization on the scatter plot. Note how outliers appear.
step 3: go to Analysis: Normalization: Zscore of intensity
step 4: go to Analysis: Normalization: Zscore of log intensity, stdDev
step 5: go to Analysis: Normalization: Unnormalized
           this does not scale data between samples.
step 6: go to Analysis: Normalization: Median intensity
           this leaves the normalization method in Median mode.



A.3.3 Analysis of the expression profiles of gene classes

You may restrict the set of genes by Gene Class. Several built in gene classes are defined. You may also set up additional ones and filter by those (not covered in this short tutorial).

A.3.3.1 Filter by gene class membership

step 1: go to Analysis: GeneClass: All known genes
           the array only shows named genes (additional gene subclasses are being added)
step 2: go to Analysis: Plot: Scatter plots: HP-X vs. HP-Y
           to see the two condition expression of just these genes
step 3: go to Analysis: Plot: Expression profiles: Display Filtered genes expression profiles
           to see the multiple condition expression of just these genes. This may take a
           while if there are many genes
step 4: you can click on a line in any of the plots to see the samples' intensity value for that gene
step 5: when done, press "Close" button in all pop up plot windows

A.3.3.2 Gene Reports

step 1: go to Analysis: Report: Gene reports: Filtered genes: Genes passing Filter
           Clicking on a blue entry will bring up I.M.A.G.E, dbEST, UniGene, or GenBank,
           LocusLink, or mAdb Clone database in pop up Web page
step 2: press "Close" button in report, and close this pop up Web page
step 3: go to Analysis: Report: Table format: Tab-delimited
           to enable creating Excel-compatible reports

A.3.3.3 Exporting Gene Reports to Excel

step 1: repeat step 1 of the Gene Report, but this time to make text-formated report
step 2: cut the text from this window and paste it into an Excel window.
           This is useful for exporting data if you are on a Windows PC
step 3: go to Analysis: Filter: all genes
           to restore it to all of the genes from all named genes
step 4: go to Analysis: Report: Table format: Spreadsheet
step 5: press "Close" button in report



A.3.4 Analysis of the expression profile of multiple hand picked genes

Users can manually define a set of genes which are kept in the Edited Gene List (E.G.L.). Various operations can then use the EGL to restrict the set of data being analyzed.


A.3.4.1 Define a list of edited genes, then plot all their expression profiles at one time

step 1: go to View: Show 'Edited Gene List'
           this turns on the 'Edited Gene List' magenta square box overlays
step 2: hold CONTROL key and click on genes in array to add a gene
step 2': hold SHIFT key and click on genes in array to delete a gene.
           This lets you edit a list of genes. It also works when clicking in a scatter plot
step 3: go to Analysis: Plot: Scatter plots: HP-X vs. HP-Y
           to see the Edited Gene List in the scatter plot
step 4: try defining (or removing) E.G.L. genes in the scatter plot by holding the
           CONTROL (or SHIFT) key when clicking on points in the scatter plot

A.3.4.2 Filtering by edited gene list

step 1: go to Analysis: Filter: Filter by 'Edited Gene List'
step 2: go to Analysis: Plot: Expression profiles: Display Filtered genes expression profiles
           scroll through the plots to see all of the profiles
step 3: go to Analysis: Filter: Filter by 'edited gene list'
           this turns off the 'edited gene list' filter
step 4: press "Close" button in expression profiles window

A.3.4.3 Report of edited gene list

step 1: go to Analysis: Report: Gene report: genes in 'edited gene list'
           reports edited genes
step 2: press "Close" button in report
step 3: go to Analysis: Filter: Filter by 'edited gene list'
           this turns off the 'edited gene list' filter
step 4: go to View: Show 'edited gene list'
           this turns off the 'edited gene list' squares overlay



A.3.5 Identify a cluster of genes with similar expression profile to the current selected gene

step 1: go to GeneClasses: All named genes and ESTs
step 2: go to Analysis: Plot: Cluster plots: Cluster genes with expression profiles similar to current gene
           this will pop up a cluster summary and cluster distance slider control window.
           Move the summary and slider windows so you can see all 3 windows. The size of
           the cyan boxes on similar genes in the pseudoarray is proportional to the similarity.
           Adjust the cluster distance slider to smaller values and note how the number of genes clustered decreases.
           It should be set for a reasonable number considering the material you are analyzing.
step 3: select (click on) a new current gene
           the genes which belong to that cluster are labeled in the array with cyan boxes
           and are defined as the "current cluster". The current gene you click on has
           a green circle around it
step 4: press "Cluster Report" button in the cluster summary
           this pops up a Gene Report for the clustered genes
step 5: press "Close" button in the report
step 6: press "EP plot" button in the cluster summary
           this pops up a scrollable list of expression profile plots sorted by similarity
           to the current selected gene.
step 7: press "Close" button in the report
step 8: press "Close" button in the cluster summary



A.3.6 Identify clusters of genes with similar expression under various conditions using data mining filters

step 1: go to GeneClasses: ESTs similar to genes
step 2: go to Analysis: Plot: Cluster plots: K-means clustering of gene expression profiles
           this will pop up a cluster summary and slider control window. Move the summary
           and slider windows so you can see all 3 windows. The size of the
           magenta circles in the array is proportional to # genes/cluster
step 3: select (click on) a new current gene
           the genes which belong to that cluster are labeled in the array with tiny green
           numbers are defined as the "current cluster". The current gene you click
           on has a green circle around it
step 4: go to View: Show 'edited gene list'
           genes in the current cluster were also copied to the edited gene list
step 5: go to Analysis: Report: Gene report: genes in 'edited gene list'
           reports genes in the current cluster
step 6: press "Close" button in report
step 7: go to View: Show 'edited gene list'
           this turns off the 'edited gene list' squares overlay

A.3.6.1 Varying the number of clusters

step 1: vary the "# of clusters" slider value from 6 to 10, then 20
           note the number of clusters changes and the gene cluster composition also changes

A.3.6.2 Defining a new cluster "seed" to recluster the genes

step 1: select a new current gene in array and press the "Recompute clusters" button
           this recomputes the clusters using the current gene as the new seed gene


A.3.6.3 Cluster expression profile plots

step 1: press "EP plot" button and scroll down the list after they appear
           the primary nodes for each cluster are indicated with red labels in the set of
           profiles, and the other genes are labeled with their cluster number
step 2: press "Mean EP plot" button and scroll down the list after they appear
           these are the mean expression plots of the primary nodes clusters.

A.3.6.4 Report of all clusters

step 1: press the "Cluster-Report" button to get a sorted cluster
           list scroll the spreadsheet to the right to see the cluster statistics
step 2: press the "Mn-Cluster-Report" button to get a sorted cluster list
           scroll the spreadsheet to the right to see the mean expression profiles
step 3: press "Close" button in pop up windows

A.3.6.5 Current cluster in scatter plot

step 1: go to Analysis: Plot: Scatter plots: HP-X vs. HP-Y
step 2: move the plot so you can see both scatter plot and array
step 3: click on a gene in the cluster or on spots in the scatter plot
           note that the green cluster numbers are drawn in the scatter plot
step 4: go to Edit: Sets of genes : Save 'Edited gene list' as gene sets
           this will pop up a dialog box requesting "Enter new gene set name"
step 5: type "Genes in current cluster class"
           this will save the current cluster in a gene set. This gene set will
           be used in the next example
step 6: press "Close" button in pop up windows
step 7: (optionally) investigate hierarchical cluster with clustergrams and
           dendrograms by going to Plot : Cluster Plots : Hierarchical clustering plot for HP-E



A.3.7 User Gene Set operations

You may manipulate sets of genes. Some of these are predefined for you by the database (eg. All named genes, ESTs, etc.). Others are defined by particular operations (E.G.L., clustering, etc.), and lastly others may be defined by you using logical operations on these sets (OR, AND, DIFFERENCE).

A.3.7.1 List of the current gene sets

step 1: go to Edit: Sets of genes : List saved gene sets
           this lists the current list of gene sets
step 2: Change the E.G.L.
set of genes and note how the # of E.G.L. genes changes in the list.
           You can add (remove) genes to the E.G.L. by clicking on a spot in the array while the
           CONTROL (SHIFT) key is held down.

A.3.7.2 Filter by user defined gene set

step 1: go to Edit: Sets of genes : Set 'User Filter Gene Set' (for Filter)
           this will request a gene set to use with the Filter in a pop up dialog box.
           Enter gene set # for the set for "Genes in current cluster class" which you saved
           in the previous example.
           then press "Ok" in the dialog box.
step 2: go to Analysis: GeneClass: All genes and ESTs
           this resets the filter to look at all genes and ESTs
step 3: go to Analysis: Filter: Filter by 'User Gene Set' membership
           this restricts the genes to the saved current cluster in the previous example


A.3.7.3 Gene set operations

step 1: go to Edit: Sets of genes : OR (Union) of 2 gene sets
           this will request 3 gene set names in a pop up dialog box.
           Enter set # for (All known genes) for the 1st gene set name,
           Enter set # for (Genes in current cluster class) for the 2nd gene set name,
           Enter "Union of known genes and genes in current cluster" for new gene set name.
           then press "Ok" in the dialog box.
           this computes the union of the two gene sets into a new gene set
step 2: go to Edit: Sets of genes : Set 'User Filter Gene Set'
           this will reset the 'User Filter Gene Set' for the Filter in a pop up dialog box.
           Enter the set number or the beginning of the set name 'Union' that is the
           set for "Union of known genes and genes in current cluster" just saved.
step 3: try saving other Filtered genes sets and doing other gene set operations.



A.4 Additional tutorials

If you wish to investigate MAExplorer in more detail, try some of the suggested examples in the
advanced tutorial (Appendix B) in the reference manual.