Reference Manual++ - MAExplorer Microarray
Exploratory Data Analysis

The MicroArray Explorer Java tool for data mining microarrays **Legend
(Click on figures to show higher resolution versions)

 


*** DRAFT ***
$Date: 2003/11/23 10:29 $ MicroArray Explorer-V0.96.34.01 "Millenium" edition

Peter F. Lemkin
Laboratory of Experimental and Computational Biology
Center for Cancer Research, National Cancer Institute - Frederick, Frederick, MD 21702

MAExplorer home | Table of Contents | Overview | Introduction | Menu summary | PDF documents |
Newsletters |
Plugins | Quick start | Short tutorial | Advanced Tutorial | Glossary | Figures | Tables | Index | Help desk

SourceForge     http://maexplorer.sourceforge.net/
  or
LECB/NCI       http://www.lecb.ncifcrf.gov/MAExplorer

++ Note: This hypertext manual is divided into chapters and appendices Web pages. These may be printed individually from your Web browser by (1) clicking in the text window to be printed, and (2) using the "Print Frame" in Netscape or "Print" in Internet Explorer. Some of the chapters (eg. 2) have many images. The entire manual may be downloaded at one time with low resolution figures and is suitable for printing in the Web browser. You may also download a an Adobe acrobate PDF file version of the entire manual with the lower resolution figures (~5Mb). The Unix script for creating the full reference manual from the individual HTML pages is CreateMaeFullRefManual.do.

The MAExplorer is a Java-based bioinformatics exploratory data-analysis and data-mining program for analyzing sets of quantitative spotted cDNA or oligonucleotide microarray data (Lemkin et al., 2000) - (see (Schulze, 2001) for a review of microarray technology).

Prior to its release on SourceForge, MAExplorer was developed by Dr. Peter Lemkin (LECB/NCI-Frederick) with help from Gregory Thornwall (SAIC) and Jai Evans (DECA/CIT, NIH). It was initially created for analyzing 33P labeled membrane array data from the mouse mammary tissue from Mammary Genome Anatomy Project (MGAP) http://mammary.nih.gov/ with the help of many researchers in the Laboratory of Genetics and Physiology, NIDDK under Dr. Lothar Hennighausen. Since the early work with MGAP it was extended to work with other types of cDNA and oligo arrays and various nucleotide labeling methods. These include spotted Cy3/Cy5 glass slides, spotted membranes, non-geometric chip data, and other chip supports with different geometries and numbers of duplicate spots/gene, clones as well as oligo chip data such as Affymetrix. A wizard tool called Cvt2Mae was developed to make it easier for other researchers to convert their data to the format required by MAExplorer. Cvt2Mae was developed by Peter Lemkin, Greg Thornwall and Bob Stephens (ABCC/SAIC). You may extend the set of builtin analysis methods by writing Java plugins called MAEPlugins.

This document describes the MAExplorer's functionality, provides tutorials and contains documentation for using it with various types arrays.

With this program, you may: 1) analyze expression of individual genes; 2) analyze expression of gene families and clusters; 3) compare expression patterns for multiple hybridized samples.

MAExplorer is written in Java and runs as a stand-alone application that you download to your computer. Although MAExplorer began out as a Java applet for use with with Web browsers for the MGAP Web database ( http://www.lecb.ncifcrf.gov/mae ), we have depricated its use as an applet because of many problems with running large Java applets in some Web browsers. Instead, we recommend downloading MAExplorer which includes the public MGAP array data as a demonstration data set. Then run MAExplorer on this data after you have installed it on your computer.

Notation: MAExplorer uses the notation that the sample probe total mRNA is labeled and then hybridized against the known cDNA targets tethered to the microarray. Because of this notation, we refer to a hybridized sample as a HP. An alternative notation that reverses these terms is also commonly used (see "Chipping Forecast", Nature Genetics supplement, Jan, 1999, pg 1). Also, because arrays may be constructed from either spotted clones or oligonucleotides, we refer to hybridized chip DNA from any of these sources genericlly as "genes".

Throughout this document we use the abbreviations HP for hybridized sample, GC for gene class. These and other terms are explained in the Glossary and Index . There are a number of figures and tables illustrating various features of MAExplorer throughout this manual. Figures are presented at low-resolution. By clicking on the lower-resolution figure, the high-resolution versions can be viewed.

NOTES: because MAExplorer is under development, there may be occasional problems with some of its functionality. There may also be some problems (mostly bad HTML links) with migrating from LECB/NCI to the SourceForge Web site. Some operations that are under development are labeled with "[Future]" in this manual. We welcome your suggestions for improvements as well as letting us know about problems that you encounter. Occasionally the manual or the figures in the manual may not be quite in phase with the software. Please notify us of problems or suggestions by E-mail so we can try to fix or implement them. If you are a bioinformatics developer and would be interested on working with the MAExplorer project, consider joining the MAExplorer development team on SourceForge.net.


**Icon Legend

Data from a 38 sample subset of hybridized samples from the MGAP mouse microarray database. This screen illustrates a synthetic pseudoarray image showing the ratios of duplicated grids of genes comparing day 13 pregnancy in C57B6 mouse (sample HP-X 'set') with Lactation day 1 (sample HP-Y 'set'). The color scale of the spots is indicated on the left as is the current data normalization mode (median). Genes with white circles are named genes and were selected by the data filter. A scatter plot of this data is shown on the right with genes passing the data filter indicated as red + and those not passing the filter (i.e. ESTs, calibration DNA, user's genes) shown as gray + symbols. A single gene was selected by clicking on it in the array image and has a yellow circle (grid 1-D) and a corresponding green circle in the scatter plot. Information on that gene is indicated above the array and at the top of the scatter plot. MAExplorer can also be used to view mean data from sets of samples e.g. Day 13 pregnancies from C57B6 (3 HP-X samples) vs. Day 1 Lactation (4 HP-Y samples) at low or high resolution.


Table of Contents

Overview
Menu summary
Quick start

1. Introduction
1.1 Microarrays and notation used with MAExplorer
1.2 Microarray image quantification
   1.2.1 Ratio and Zscore comparison of data from different hybridized samples
1.3 Microarray image and plot display
1.4 Exploratory data analysis - overview
   1.4.1 Saving the state of a data-mining session in stand-alone mode
   1.4.2 Logging messages and command history
1.5 Quick start - demonstration of MAExplorer
1.6 Tutorials for using MAExplorer

2. MAExplorer menus
2.1 File menu
   2.1.1 Databases menu
   2.1.2 Exploratory state menu
   2.1.3 Groupware facility for sharing user states menu
2.2 Samples menu
   2.2.1 Selecting sample HP with chooser or menu sample lists
   2.2.2 Swapping selected samples's (Cy3,Cy5) channels in ratio data dye-swap experiments
   2.2.3 Viewing sample HP-X, HP-Y, and HP-E partitions
   2.2.4 Defining sample condition 'class' names
   2.2.5 Toggling between single HP-X (-Y) samples and HP-X (-Y) sets
2.3 Edit menu
   2.3.1 User edited gene list - the 'Edited Gene List' menu
   2.3.2 Sets of genes menu
   2.3.3 Sets of Sample Conditions menu
   2.3.4 Setting user preferences menu
2.4 Analysis
   2.4.1 GeneClass menu
      2.4.1.1 GeneClass ontology subsets
      2.4.1.2 Simulating Gene Class ontologies using Gene Set operations
   2.4.2 Normalization menu
      2.4.2.1 Intensity background correction
      2.4.2.2 Normalization between microarrays to allow comparison
      2.4.2.3 Using different normalizations to 'see' different data views
   2.4.3 Filter menu
      2.4.3.1 Data filtering using multiple gene data filters
   2.4.4 Plot menu
      2.4.4.1 Show microarray pseudoarray images menu
      2.4.4.2 Scatter plots menu
      2.4.4.3 Histogram plots menu
      2.4.4.4 Expression profile plots menu
   2.4.5 Cluster menu
      2.4.5.1 Cluster genes with expression profiles similar to current gene
      2.4.5.2 Cluster counts of similar filtered genes by expression profiles
      2.4.5.3 K-means clustering' gene expression profiles for filtered genes
      2.4.5.4 Hierarchical clustering of expression profiles
   2.4.6 Report menu
      2.4.6.1 Array report menu - hybridized samples global data
      2.4.6.2 Gene reports menu
      2.4.6.3 Table format menu
      2.4.6.4 Table font size menu
2.5 View menu
   2.5.1 Logging MAExplorer messages
   2.5.2 Logging command history
2.6 Plugins menu
2.7 Help menu

3. Exploratory Data Analysis - Data Mining
3.1 Analysis objectives
   3.1.1 Some experimental design issues of microarray experiments
   3.1.2 Design philosophy of MAExplorer methodology
   3.1.3 Evolution of MAExplorer from earlier proteomic data mining systems
   3.1.4 Concepts used in data mining with MAExplorer
3.2 Steps in an analysis
   3.2.1 Definition of expression profile
   3.2.2 Clustering Methods
      3.2.2.1 Clustering similar genes
      3.2.2.2 K-means clustering
      3.2.2.3 Hierarchical clustering
3.3 Display gene intensity and identification data measurements
3.4 Selecting subsets of genes using the data Filter
3.5 Selecting subsets of hybridized sample conditions
3.6 Setting threshold values using the state-scroller sliders
3.7 Exporting report and plot data

4. Status and Bugs of MAExplorer
4.1 Known Bugs in MAExplorer
   4.1.1 Browser Applet Bugs
   4.1.2 Downloading and Installer Bugs
   4.1.3 Computation speed and display Bugs
   4.1.4 User state and login Status
   4.1.5 Data file names Bug
   4.1.6 Gene Sets Bugs
   4.1.7 Clustering Bugs
   4.1.8 Expression profile Bugs
   4.1.9 Data conversion problems
   4.1.10 Java Plugins bugs
4.2 Revision notes
4.3 Web Browser problems when running MAExplorer as an applet
4.4 Handling fatal error reporting (i.e. DRYROT errors)

Release archive

Acknowedgments

References to related exploratory data analysis methods
R.1 Nucleic Acids Res. paper (PDF)
R.2 Overview (PDF)
R.3 Examples (PDF)
R.4 Using mAdb data with MAExplorer (PDF)
R.5 Introduction to Data Mining with MAExplorer(PDF) or (PPT)
R.6 Using Cvt2Mae to convert array data for use with MAExplorer.(PDF)
R.7 Statistics in Functional Genomics workshop paper (PDF)
R.8 Software design of the MAExplorer data mining tool (PDF) or (PPT)

Newsletters

Appendices
A. Short tutorial for MAExplorer
A.1 Demonstration data
A.2 General instructions
A.3 Self-guided tutorial of MAExplorer - notation and examples

B. Advanced tutorial

C. Use of MAExplorer with user's microarray data
C.1 Creating quantified spot data files from hybridized sample arrays
C.2 Table of samples that can be loaded into MAExplorer
C.3 Quantified spot data file format
C.4 GIPO table database file format
C.5 Configuring MAExplorer for use with other arrays
C.6 Using the Cvt2Mae 'wizard' tool to convert array data for use with MAExplorer

D. Use of MAExplorer as a stand-alone application
D.1 Installing MAExplorer as stand-alone application
D.2 Downloading MAExplorer for stand-alone use with other arrays
D.3 Starting MAExplorer by clicking on a .mae file
D.4 The data file format for .mae files
D.5 Using MAExplorer as an Applet on your computer
D.6 List of startup .mae files included in the download installation

E. Design issues
E.1 Internal data structures design to facilitate direct manipulation
E.2 Approaches to data mining: client-centric and server-centric models
E.3 Conversion of microarray data files to MAExplorer format using Cvt2Mae
E.4 Extending MAExplorer functionality using Java Plugins
E.5 Web database server design

Download Installers
Installer information

MAExplorer Plugins

Cvt2Mae wizard

MAExplorer Open Source
Download source
javadocs for source
MPL1.1 Public License
Legal

List of Figures
List of Tables
Glossary of terms used in MAExplorer
Index

Help desk


MAExplorer - Overview

MAExplorer is a bioinformatics microarray data mining Java application that may help in the discovery of genes regulated in cancer and other diseases. MAExplorer is generally run as as a stand-alone application on a local computer. By running as a local application, it is able to access your local disk to save the state of your data mining session as well as plots and reports. Using the previously saved data mining state, you can continue a data-mining session at a later date after exiting the program.

Recommended Hardware

Because data mining is a computationally and graphically intensive activity, a reasonable level of computation resources are required for adequate response. The same Java program runs on a variety of operating systems including Windows 95/98/Me/NT/2000, Macintosh OS8/9/X, Solaris, Linux, etc. so the choice of computer is not that critical. We recommend the following hardware:

Addition of user defined analysis methods using Java Plugins

We have provided the ability for users to add their own Java Plugin Extensions to MAExplorer. These extend the capabilities of the core MAExplorer program to other more sophisticated analysis methods created by users and allow interaction with specialized genomic servers. This is described in Appendix E, Section 2.6, and in the MAExplorer Plugins Web page.