|
||
Newsletters | Plugins | Quick start | Short tutorial | Advanced Tutorial | Glossary | Figures | Tables | Index | Help desk |
||
|
++ Note: This hypertext manual is divided into chapters and appendices Web pages. These may be printed individually from your Web browser by (1) clicking in the text window to be printed, and (2) using the "Print Frame" in Netscape or "Print" in Internet Explorer. Some of the chapters (eg. 2) have many images. The entire manual may be downloaded at one time with low resolution figures and is suitable for printing in the Web browser. You may also download a an Adobe acrobate PDF file version of the entire manual with the lower resolution figures (~5Mb). The Unix script for creating the full reference manual from the individual HTML pages is CreateMaeFullRefManual.do. |
The MAExplorer is a Java-based bioinformatics exploratory data-analysis and data-mining program for analyzing sets of quantitative spotted cDNA or oligonucleotide microarray data (Lemkin et al., 2000) - (see (Schulze, 2001) for a review of microarray technology).
Prior to its release on SourceForge, MAExplorer was developed by Dr. Peter Lemkin (LECB/NCI-Frederick) with help from Gregory Thornwall (SAIC) and Jai Evans (DECA/CIT, NIH). It was initially created for analyzing 33P labeled membrane array data from the mouse mammary tissue from Mammary Genome Anatomy Project (MGAP) http://mammary.nih.gov/ with the help of many researchers in the Laboratory of Genetics and Physiology, NIDDK under Dr. Lothar Hennighausen. Since the early work with MGAP it was extended to work with other types of cDNA and oligo arrays and various nucleotide labeling methods. These include spotted Cy3/Cy5 glass slides, spotted membranes, non-geometric chip data, and other chip supports with different geometries and numbers of duplicate spots/gene, clones as well as oligo chip data such as Affymetrix. A wizard tool called Cvt2Mae was developed to make it easier for other researchers to convert their data to the format required by MAExplorer. Cvt2Mae was developed by Peter Lemkin, Greg Thornwall and Bob Stephens (ABCC/SAIC). You may extend the set of builtin analysis methods by writing Java plugins called MAEPlugins.
This document describes the MAExplorer's functionality, provides tutorials and contains documentation for using it with various types arrays.
With this program, you may: 1) analyze expression of individual genes; 2) analyze expression of gene families and clusters; 3) compare expression patterns for multiple hybridized samples.
MAExplorer is written in Java and runs as a stand-alone application that you download to your computer. Although MAExplorer began out as a Java applet for use with with Web browsers for the MGAP Web database ( http://www.lecb.ncifcrf.gov/mae ), we have depricated its use as an applet because of many problems with running large Java applets in some Web browsers. Instead, we recommend downloading MAExplorer which includes the public MGAP array data as a demonstration data set. Then run MAExplorer on this data after you have installed it on your computer.
Notation: MAExplorer uses the notation that the sample
probe total mRNA is labeled and then hybridized against the
known cDNA targets tethered to the microarray. Because of this
notation, we refer to a hybridized sample as a HP. An alternative
notation that reverses these terms is also commonly used (see
"Chipping Forecast", Nature Genetics supplement, Jan, 1999,
pg 1). Also, because arrays may be constructed from either spotted
clones or oligonucleotides, we refer to hybridized chip DNA from any
of these sources genericlly as "genes".
|
Throughout this document we use the abbreviations HP for hybridized sample, GC for gene class. These and other terms are explained in the Glossary and Index . There are a number of figures and tables illustrating various features of MAExplorer throughout this manual. Figures are presented at low-resolution. By clicking on the lower-resolution figure, the high-resolution versions can be viewed.
NOTES: because MAExplorer is under development, there may be occasional problems with some of its functionality. There may also be some problems (mostly bad HTML links) with migrating from LECB/NCI to the SourceForge Web site. Some operations that are under development are labeled with "[Future]" in this manual. We welcome your suggestions for improvements as well as letting us know about problems that you encounter. Occasionally the manual or the figures in the manual may not be quite in phase with the software. Please notify us of problems or suggestions by E-mail so we can try to fix or implement them. If you are a bioinformatics developer and would be interested on working with the MAExplorer project, consider joining the MAExplorer development team on SourceForge.net. |
2. MAExplorer menus
3.
Exploratory Data Analysis - Data Mining
4. Status and Bugs of MAExplorer
References to related exploratory data analysis methods
Appendices
C. Use of MAExplorer with user's microarray data
D.
Use of MAExplorer as a stand-alone application
E. Design issues
Download Installers
MAExplorer Open Source
List of Figures
**Icon Legend
Data from a 38 sample subset of hybridized samples from the MGAP mouse microarray
database. This screen illustrates a synthetic pseudoarray image
showing the ratios of duplicated grids of genes comparing day 13
pregnancy in C57B6 mouse (sample HP-X 'set') with Lactation day 1
(sample HP-Y 'set'). The color scale of the spots is indicated on the
left as is the current data normalization mode (median). Genes with
white circles are named genes and were selected by the data filter. A
scatter plot of this data is shown on the right with genes passing the
data filter indicated as red + and those not passing the filter
(i.e. ESTs, calibration DNA, user's genes) shown as gray + symbols. A
single gene was selected by clicking on it in the array image and has
a yellow circle (grid 1-D) and a corresponding green circle in the
scatter plot. Information on that gene is indicated above the array
and at the top of the scatter plot. MAExplorer can also be used to
view mean data from sets of samples e.g. Day 13 pregnancies
from C57B6 (3 HP-X samples) vs. Day 1 Lactation (4 HP-Y samples) at low or high resolution.
Table of Contents
Menu summary
Quick start
1.1 Microarrays and notation used with MAExplorer
1.2 Microarray image quantification
1.2.1
Ratio and Zscore comparison of data from different hybridized samples
1.3 Microarray image and plot display
1.4 Exploratory data analysis - overview
1.4.1
Saving the state of a data-mining session in stand-alone mode
1.4.2
Logging messages and command history
1.5 Quick start - demonstration of MAExplorer
1.6 Tutorials for using MAExplorer
2.1 File menu
2.1.1
Databases menu
2.1.2
Exploratory state menu
2.1.3
Groupware facility for sharing user states menu
2.2 Samples menu
2.2.1
Selecting sample HP with chooser or menu sample lists
2.2.2
Swapping selected samples's (Cy3,Cy5) channels in ratio data
dye-swap experiments
2.2.3
Viewing sample HP-X, HP-Y, and HP-E partitions
2.2.4
Defining sample condition 'class' names
2.2.5
Toggling between single HP-X (-Y) samples and HP-X (-Y) sets
2.3 Edit menu
2.3.1
User edited gene list - the 'Edited Gene List' menu
2.3.2
Sets of genes menu
2.3.3
Sets of Sample Conditions menu
2.3.4
Setting user preferences menu
2.4 Analysis
2.4.1 GeneClass menu
2.4.1.1
GeneClass ontology subsets
2.4.1.2
Simulating Gene Class ontologies using Gene Set operations
2.4.2 Normalization menu
2.4.2.1
Intensity background correction
2.4.2.2
Normalization between microarrays to allow comparison
2.4.2.3
Using different normalizations to 'see' different data views
2.4.3 Filter menu
2.4.3.1
Data filtering using multiple gene data filters
2.4.4 Plot menu
2.4.4.1
Show microarray pseudoarray images menu
2.4.4.2
Scatter plots menu
2.4.4.3
Histogram plots menu
2.4.4.4
Expression profile plots menu
2.4.5
Cluster menu
2.4.5.1
Cluster genes with expression profiles similar to current gene
2.4.5.2
Cluster counts of similar filtered genes by expression profiles
2.4.5.3
K-means clustering' gene expression profiles for filtered genes
2.4.5.4
Hierarchical clustering of expression profiles
2.4.6 Report menu
2.4.6.1
Array report menu - hybridized samples global data
2.4.6.2
Gene reports menu
2.4.6.3
Table format menu
2.4.6.4
Table font size menu
2.5 View menu
2.5.1
Logging MAExplorer messages
2.5.2
Logging command history
2.6 Plugins menu
2.7 Help menu
3.1 Analysis objectives
3.1.1
Some experimental design issues of microarray experiments
3.1.2
Design philosophy of MAExplorer methodology
3.1.3
Evolution of MAExplorer from earlier proteomic data mining systems
3.1.4
Concepts used in data mining with MAExplorer
3.2
Steps in an analysis
3.2.1
Definition of expression profile
3.2.2
Clustering Methods
3.2.2.1
Clustering similar genes
3.2.2.2
K-means clustering
3.2.2.3 Hierarchical clustering
3.3
Display gene intensity and identification data measurements
3.4
Selecting subsets of genes using the data Filter
3.5
Selecting subsets of hybridized sample conditions
3.6
Setting threshold values using the state-scroller sliders
3.7
Exporting report and plot data
4.1 Known Bugs in MAExplorer
4.1.1 Browser Applet Bugs
4.1.2 Downloading and Installer Bugs
4.1.3 Computation speed and display Bugs
4.1.4 User state and login Status
4.1.5 Data file names Bug
4.1.6 Gene Sets Bugs
4.1.7 Clustering Bugs
4.1.8 Expression profile Bugs
4.1.9 Data conversion problems
4.1.10 Java Plugins bugs
4.2 Revision notes
4.3
Web Browser problems when running MAExplorer as an applet
4.4
Handling fatal error reporting (i.e. DRYROT errors)
R.1 Nucleic Acids Res. paper (PDF)
R.2 Overview (PDF)
R.3 Examples (PDF)
R.4 Using mAdb data with MAExplorer (PDF)
R.5 Introduction to Data Mining with MAExplorer(PDF) or
(PPT)
R.6 Using Cvt2Mae to convert array data for use with MAExplorer.(PDF)
R.7 Statistics in Functional Genomics workshop paper (PDF)
R.8 Software design of the MAExplorer data mining tool
(PDF) or
(PPT)
A. Short tutorial for MAExplorer
A.1 Demonstration data
A.2 General instructions
A.3 Self-guided tutorial of MAExplorer - notation and examples
C.1
Creating quantified spot data files from hybridized sample arrays
C.2
Table of samples that can be loaded into MAExplorer
C.3
Quantified spot data file format
C.4
GIPO table database file format
C.5
Configuring MAExplorer for use with other arrays
C.6
Using the Cvt2Mae 'wizard' tool to convert array data for use with
MAExplorer
D.1
Installing MAExplorer as stand-alone application
D.2
Downloading MAExplorer for stand-alone use with other arrays
D.3
Starting MAExplorer by clicking on a .mae file
D.4
The data file format for .mae files
D.5
Using MAExplorer as an Applet on your computer
D.6
List of startup .mae files included in the download installation
E.1
Internal data structures design to facilitate direct manipulation
E.2
Approaches to data mining: client-centric and server-centric models
E.3
Conversion of microarray data files to MAExplorer format using Cvt2Mae
E.4
Extending MAExplorer functionality using Java Plugins
E.5
Web database server design
Installer information
Download source
javadocs for source
MPL1.1 Public License
Legal
List of Tables
Glossary of terms used in MAExplorer
Index
MAExplorer - Overview
MAExplorer is a bioinformatics microarray data mining Java application
that may help in the discovery of genes regulated in cancer and other
diseases. MAExplorer is generally run as as a stand-alone application on a
local computer. By running as a local application, it is able to
access your local disk to save the state of your data mining session
as well as plots and reports. Using the previously saved data mining
state, you can continue a data-mining session at a later date after
exiting the program.
Recommended Hardware
Because data mining is a computationally and graphically intensive
activity, a reasonable level of computation resources are required for
adequate response. The same Java program runs on a variety of
operating systems including Windows 95/98/Me/NT/2000, Macintosh
OS8/9/X, Solaris, Linux, etc. so the choice of computer is not that
critical. We recommend the following hardware:
Addition of user defined analysis methods using Java Plugins
We have provided the ability for users to add their own Java Plugin
Extensions to MAExplorer. These extend the capabilities of the core
MAExplorer program to other more sophisticated analysis methods
created by users and allow interaction with specialized genomic
servers. This is described in Appendix E, Section 2.6, and in the MAExplorer Plugins Web page.
1. Introduction
This hyperlinked manual provides a detailed description of the
MAExplorer conventions (Section 1) and operation (Section 2). The latter contains
many figures of computer screens showing the operations described in
the Section. Section 3
discusses typical scenarios in using MAExplorer for data-mining
microarrays and contains a brief introduction to the process of data
mining. Section 4 lists currently
known bugs and the revision history. Appendix A is a short
tutorial. Appendix B is a more
advanced tutorial. Appendix
C describes the files required by MAExplorer and how they may be
created for using MAExplorer with other array data. It also describes
the data conversion tool Cvt2Mae. The Appendix D covers downloading,
installing and running MAExplorer as a stand-alone Java application on
a local computer. Appendix E
discusses design issues for the MAExplorer Java program and supporting
Web servers. Users may create new analytic methods and add them as MAExplorer Plugins as Java
extensions. There is a
glossary of terms used in MAExplorer. There is also a List of Figures, a List of Tables, and an Index to help find material of interest.
MAExplorer is normally used as stand-alone program
Figure 1 gives an overview of the
system. Note that MAExplorer does not perform spot
quantification from raw scanned images - it is used for the
subsequent data mining analysis of quantified spot data. Figures 1.1.1 through
1.1.3 describe this in more detail.
Figure 1. Overview of MAExplorer exploratory data analysis
system. Initial data preparation steps are performed prior
to analysis by MAExplorer and are indicated by cyan
italics at the top of the figure. The primary data consists of
quantified microarray image data as well as corresponding qualitative
clone ID, gene-in-plate-order (GIPO or print-table, etc.), gene name,
hypertext base references and related information. After the
microarrays are hybridized, they are scanned and spots quantified
using image spot quantification programs. These lists are then saved
for each array in a tab-delimited file. Microarray image
quantification may be performed by various software such as Axon's
GenePix(TM), Scanalyze, Molecular Dynamics
ImageQuant(TM), Research Genetics' Pathways(TM),
etc. When used as a stand-alone application, data may be saved on the
local computer for local off-line use, and direct access to other
Internet genomic databases may be made without using a proxy server.
[DEPRICATED: When used as an applet, this auxiliary
databases and the MAExplorer Jar files are copied to the Web server or
local file system (in the case of the stand-alone version) where they
are then available to be downloaded by users. When a user invokes a
Web page containing the Java applet, it first downloads the applet
that then downloads auxiliary databases including a configuration file
that describes the array data. It then downloads the subset of
quantified microarray spot data files requested for the set of
hybridized samples being investigated. Additional samples may be
downloaded at any time. When the user selects an operation that
requires access to Web databases not residing on the MAExplorer Web
server, implicit Java security restrictions prevent the applet from
going directly to these other Web servers. Instead, it requests the
MAExplorer proxy server request the data from the foreign Web server,
and then returns it back to the user's Web browser. ]
Figure 1.1.1 Overview of MAExplorer exploratory data analysis
system. MAExplorer is used as a stand-alone application on local
data. [Its use as a Web browser applet has been DEPRICATED. In
the case of the applet, it may only access quantified array data from
the Web server that launched the applet.]
Figure 1.1.2 Overview of data preparation for quantified spot data
used by MAExplorer. MAExplorer handles quantified spot data as
shown in this figure. Arrays are hybridized against labeled samples
are scanned and spots are quantified into spot data files. Quantified
spot data is represented as tab-delimited data with data for one
spot/row. Each spot is identified in this file by its grid
coordinates (grid, grid row, grid column) with image (X,Y) coordinates
being optional. Quantified spot data includes the raw spot intensity
for each channel (in the case of multiple channels such as Cy3, Cy5,
etc.). If the original data has background spot intensity values, then
that may be included as well - otherwise no background data will be
available for background correction. The spot data is discussed in
more detail in Section 1.1 and Appendix C.1, and Appendix C.3.
Figure 1.1.3 Overview of running MAExplorer as a stand-alone
application. The preferred way of running MAExplorer is as a
stand-alone application. There are distinct advantages in running
MAExplorer as an application in that data and the exploration state
may be saved on the users local computer, direct access to genomic
servers is easier (no proxy server required - see Figure 1.4). MAExplorer plugin
extensions (MAEPlugins) may only be used with the stand-alone
version. Since MAExplorer is packaged for
download for a variety of operating systems, using this method is
not difficult to set up and the MAEPlugins should run on a variety of
operating systems.
Figure 1.1.4 [DEPRICATED] Overview of running MAExplorer as a Web
browser applet. An alternative way of running MAExplorer on
existing databases is as a Web-browser applet. There advantage of this
method is that no software installation is required on the user's
computer. However, the user may not save data and the exploration
state on their local computer. Furthermore, direct access to genomic
servers requires a proxy server. MAExplorer plugin extensions
(MAEPlugins) may not be used with the the applet version. The Mammary Genome Anatomy Program
(MGAP) originally used the MAExplorer applet.
In MAExplorer we refer to grids by letter names (A,B,C,...) and fields
by F1 and F2. If you are using Cy3/Cy5 ratio data and the Cy3 and Cy5
data is available as independent channels for each HP sample, then
operations that use F1 and F2 will use the Cy3 and Cy5 data for
various operations such as scatter plots (Cy3 vs Cy5), etc. If there
is only one field in an array (i.e. no duplicate grids), then when
MAExplorer is run, operations and menus describing F1 and F2
operations will not be available.
Using duplicate (F1 and F2) spots allows us to get an estimate of the
hybridization variance within an array and is used to compute the
(F1,F2) gene coefficient
of variation (CV) used in the gene data Filter to remove noisy
data before looking for additional differences. Note that if Cy3/Cy5
data is used, then F1 and F2 duplicates are not allowed as MAExplorer
uses the (F1,F2) data to hold the(Cy3,Cy5) data for a hybridized
sample.
Example of a MAExplorer database -
http://www.lecb.ncifcrf.gov/mae - the public MGAP DB
The Mammary Genome Anatomy Program (MGAP) microarrays of cDNA clones
from mouse mammary tissue (collaboration with Research Genetics) were
hybridized with 33P radio-labeled samples. These were then
used to charge fluorescing plates. See the MGAP site for more
documentation on the database and preparation procedures. The
hybridized arrays are scanned on a phospho-imager scanner at high
resolution. Spot data was quantified from these images using the
Research Genetics' "Pathways 2.01" program which generated
tab-delimited data files. This data also includes the microarray grid
point locations (field, grid, grid row, grid col) from the associated
microarray description data files (grid-in-plate-order data). When you
download MAExplorer, you will also
download the public MGAP dataset.1.1 Microarrays and notation used with MAExplorer
In general, microarrays are hybridized using cDNA samples derived from
mRNA labeled with either radio-label, biotin, fluorescent dyes, or
other methods (see Schulze,
2001) for review of the technology). MAExplorer may be used to
construct databases using single-labeled sample intensity (e.g.,
Affymetrix, 33P radio-labeled, etc.) and double-labeled
ratio fluorescent (i.e. Cy3/Cy5) data arrays with different GIPO
geometries.
Definition of "Condition list of samples"
Samples are organized into Condition Lists of samples (generally
replicate samples). These may be used in various statistical and
clustering tests. There are three built-in lists of samples called
the HP-X 'set', the HP-Y 'set' and the HP-E list. The X and Y sets are
used in various 2 condition tests such as the t-Test between the X and Y
sets (Section 2.4.3). The HP-E list is an ordered expression list
of samples used in clustering and in displaying expression
profiles. You may interactively define new or edit
named condition lists using a graphical wizard (Section 2.6), manipulate and assign them to the HP-X
'set', HP-Y 'set' and HP-E list. Some examples of condition lists
might be (assuming you have the data available in your database):
Virgin = ( V.1, V.2, V.3 )
Pregnacy = ( P13.1, P13.2, P13.3 )
Lactation = ( L3.1, L3.2, L3.3 )
Involution = ( I4.1, I4.2, I4.3 )
Definition of "Ordered Condition list" of multiple condition lists
We further extend this paradigm by defining a meta-data structure
called the "Ordered Condition List" or OCL. This is an list of
multiple conditions that you have previously defined. The OCL
may be sorted if you want and the data lends itself to
sorting. E.g., a time series of conditions lends itself to sorting -
different types of diagnoses may not. The OCL may be used in various
statistical tests (e.g., the F-test applied to the current OCL - see Section 2.4.3)). You may interactively
define new or
edit named Ordered Condition Lists using a graphical wizard
(Section 2.7).An example of an ordered condition list might be:
Partuition= ( Virgin, Pregnacy, Lactation, Involution )
Definition of "intensity" for single-labeled samples
MAExplorer uses the term "intensity" in slightly different ways
dependent on whether you are using the single-labeled or fluorescent
double-labeled data. For single-labeled data, "intensity" is the raw
quantified data value as measured by the image scanner. Raw data must
be normalized between samples in order to compare it between
samples. Therefore, to compare N samples, you must first normalize the data and then
compare them.Definition of "intensity" for fluorescent double-labeled samples
For fluorescent double-labeled data, the Cy3 and Cy5 dye-labeled (for
example) measurements are the raw quantified data values as measured
by the image scanner. In this case, "intensity" is defined as the
ratio of Cy3 to Cy5 (i.e. Cy3/Cy5). If you wish to look at the ratio
as Cy5/Cy3, you may flip the two channels on a per-sample basis (see
Section 2.2.2 for more
details).Issues of experimental design of microarray experiments
Some of the issues involved in experimental design
(setting up experiments) based on the types of arrays are discussed in
Section 3.1.1 for (Cy3/Cy5)-labeled as well as 33P-labeled
samples. Poorly designed experiments will not yield significant
statistical results, so attention should be paid to developing an
adequate and robust design for your data given costs of doing
experiments as well as statistical constraints on analyzing the data.
Actual and "Pseudoarray" image geometry
The main MAExplorer windows contains a pseudoarray image for
visualization purposes. It may or may not correspond the spot
positions on the actual array. This array geometry is defined by the
number of replicate Fields (normally 1) each of which contains a
number of grids (also called "blocks") containing a number of
rows/grid and columns/grid of spots. If there is no explicit array
geometry or spot (X,Y) coordinate data available but simply gene
identifiers and intensity data, then an arbitrary pseudoarray geometry
is generated. If there is an explicit array geometry, then it waill
draw the pseudoarray using this geometry. The database configuration
determines which method will be used and is discussed in Appendix C.5. If there is no
explicit grid geometry, the number of spot Locations (e.g., IncyteID,
Affymetrix probe_set) may be used to synthesize a set of grids of a
size that is reasonable for viewing with MAExplorer. This is done in
the Cvt2Mae array data
conversion program when the array
geometry (#grids, #rows/grid, #columns/grid) is not known. This
conversion is not done in MAExplorer itself.
In Cvt2Mae we generate a visually appealing pseudoarray image geometry
if no array geometry is specified with the data (e.g. Affymetrix data,
etc). It maps the number of N spot data entries to a
(#grids,#grid-rows,#grid-columns). The algorithm is given in Appendix C.6 as well as
a suggestion for handling
non-standard geometries using Cvt2Mae.
Gene coordinate numbering on the microarray
A gene coordinate
numbering is a mapping of gene identifiers to locations on the
array for a particular array geometry. These are described by
grids (or blocks), each consisting of grid rows by
grid columns of spots. The grids may be repeated on the array
and constitute duplicate fields. Some arrays group subsets of
grids into meta-grids which are specified by meta-grid
rows by meta-grid columns of grids. MAExplorer can handle
grids but not meta-grids. In the case where there is no
array grid geometry specified or meta-grids are used, an arbitrary
pseudoarray geometry can be constructed to serve as a basis to display
the microarray pseudoimage (see the Algorithm for constructing the
pseudo array from a list of spots in Appendix C.6).
Example: special array spot coordinate numbering for the MGAP arrayAs an example of this coordinate system, the following describes the array geometry for the array used in the NIDDK MGAP database. The general principal with different sizes and numbers of fields is the same for other arrays. The MGAP array was spotted by Research Genetics for MGAP. Clones in the array are laid down in grids consists of 8 rows and 24 columns per grid. There are 8 grids (named A through H or 1 to 8) to a field with a space between grids. Finally, there are two fields (left and right named 1 and 2 or F1 and F2) that are duplicates.Note: we currently present the MGAP arrays with grids A through H oriented from top to bottom - whereas Research Genetics orients them rotated +90 degrees with grid H to the left and grid A to the right. This occurred when the images were scanned with a -90 degree change in the orientation. Therefore, we have swapped rows and columns in our relative orientations so it meets with users normal expectations of row-column orientation. This could be easily changed to the Research Genetics convention using a parameter in the configuration file. Since the actual plate coordinates are tracked with each clone and reported when it is accessed in MAExplorer, the image coordinate system is not that critical - although the verisimilitude of actual array layout and the data-mining layout can be useful.
|
Various gene identifiers may be present in the GIPO data file
associated with the array. One of these is selected to as a unique
identifier to represent genes in the MAExplorer database. Normally,
the Master gene ID is defined as the Clone ID. However if the
Clone ID is not present, but the GenBank ID is, it will use the latter
as the identifier. If neither GenBank nor Clone ID is present, it
will use GenBank5' then GenBank3' if present. If that is not present,
it will use the UniGene ID if is present. If that is not present, it
will use dbEST5' then dbEST3' if present. If that is not present, it
will use LocusLink LocusID if present. Finally, if none of those
identifiers are present, you can specify a 'Generic ID' that is
related to some other database gene identifier such as a 'Location'
identifier.
The current gene may be specified by clicking on a spot in the
microarray image or on a point in the popup scatter plot, or a gene
ID cell in a report.
Setting the "current gene" to a specific gene by "Master gene ID"
The MAExplorer uses the concept of the "current gene" to indicate a
particular gene to be analyzed. You may interrogate the microarray
database or Internet databases for data on the current gene or to use
it in one of the operations. For example, you might cluster genes by
expression profiles to find other genes with profiles similar
to the current gene. Setting the "current gene" by Gene Name Guesser
In addition, the user may type a specific gene name or clone ID into a
popup Gene Name Guesser dialog text window. This is invoked by
clicking on the blue button "Enter gene name or clone ID" at the top
right in the control panel. When the "guesser" window pops up, start
typing the gene name or clone ID in the blue text entry field. You
select either the Gene Names, Clone ID, UniGene ID, GenBank, GenBank
3' or GenBank 5',dbEST 3', dbEST 5', or LocusID identifier. Then you
may start typing letters and it will match all names or identifiers
which are prefixed with the sub-string you have typed so far. As you
type more characters, it will limit the list of possible completions
of what you are typing. After selecting the gene you want, you then
press the "Done" button to use this entry to set the current gene and
remove the guesser popup window. You may press the "Clear" button to
clear what you have typed and the "Cancel" button to cancel the
current gene selection process.
Setting the "Edited Gene List" subset of genes using wildcard
names
You may also define a set of genes from the guesser window using
wildcard names where the character '*' matches zero or more
characters. First you specify a sub-string common to gene names. Then
press the "Set E.G.L." (set 'Edited Gene List') button. For example (see Figure
2.3.1), you could find all oncogenes and proto-oncogenes by typing
"*ONCO*" in the guesser. It automatically enables the View 'Edited
Gene List' in the array that shows genes in the E.G.L. enclosed in
magenta boxes.
The current gene cluster
Some operations involving clustering will automatically assign the
gene cluster to the E.G.L. This includes clustering of genes similar
to a selected (i.e. current) gene and K-means clustering. In the case
of K-means clustering, the cluster you select by picking a gene
belonging to that cluster will cause it to be defined as the current
cluster and also assigned to the E.G.L. This will be discussed in more
detail in the section on clustering.
The current Condition List of samples
The current condition list of samples
is the last condition edited with the interactive graphical wizard
(Section 2.6) used to define new or edit condition lists. The current Ordered Condition List (OCL) of multiple conditions
The current ordered condition
list (is a possibly ordered list of Multiple Condition
Lists) is the last condition edited with the interactive graphical
wizard (Section 2.7) used to define new or edit ordered condition
lists.Saving full resolution plots as GIF files in stand-alone mode
The various plots may be saved as full resolution GIF files when
running MAExplorer in stand-alone mode. The various plots have
"SaveAs" buttons which appear in stand-alone mode. Saving your
intermediate results may be useful for documenting your data mining
session or for subsequent publication. (Here is an example of a full
resolution
clustergram of 38 MGAP hybridized samples for 1076 named and EST
genes).
Saving Text windows as .txt files in stand-alone mode
The various text windows may be saved as .txt files when running
MAExpplorer in stand-alone mode. The various text windows have
"SaveAs" buttons which appear in stand-alone mode. Saving your
intermediate results may be useful for documenting your data mining
session or for subsequent publication.
1.2 Microarray image quantification
Quantification data for all genes in a hybridized sample (x and y
coordinates, intensity, background density) is obtained by reading
data from a quantification file for that hybridized sample. The
quantification file for each hybridized sample resides on the local
file system (for stand-alone) or MAExplorer Web server (for applet
use) and is derived from image quantification programs such as Axon's
GenePix(TM) program, Scanalyze, Molecular Dynamics'
ImageQuant(TM) program, Research Genetics'
Pathways(TM) program, etc. These programs are
independent of MAExplorer and are not part of our downloadable
software distribution.
Normalization between hybridized samples must be performed to
allow comparison between different hybridized array samples. File
formats are discussed in Appendix
C).
1.2.1 Ratio and Zscore comparison of data from different
hybridized samples
Because of variation between hybridized samples, data is normalized.
Methods that are pure scaling transformations (such as Median, Scale to 65K, By Calibration DNA, By Use Gene Set,
etc.) allow you to compare data using the ratio between two normalized
sets of data. We define the ratio for two samples as follows:
ratio(x,y,c) = Ixc / Iyc
where:
samples x,y have values Ixc and Iyc for the same
gene c in samples HP-X and HP-Y
The Zscore method transforms the data such that it can not be used
with the ratio comparison. Instead we use the Zdiff(x,y) method for
comparing Zscore developed by Mark Vawter (Vawter, 2000). Zscores typically
cover the range of -3.0 to +3.0 (standard deviations) with a
transformed mean of 0.0. Therefore the Zdiff will typically cover the
range of -6.0 to +6.0.
Let
Zscore(p,c) = (Ipc - meanp)/stdDevp
where:
Ipc is the intensity of gene c for sample p. Sample p has meanp
and stdDevp
Then,
Zdiff(x,y,c) = Zscore(x,c) - Zscore(y,c),
where:
samples x,y have Zscore(x,c) and Zscore(y,c) normalized values for the
same gene c in samples HP-X and HP-Y, or HP-X 'sets' and HP-Y 'sets'.
|
1.3 Microarray image and plot display
The MAExplorer displays one microarray pseudoarray image of the
hybridized samples. This is either for a single sample, the ratio of
two samples, the average of replicate samples or the ratio of two sets
of replicate samples, the ratio Cy3/Cy5 or Cy5/Cy3, or other
mappings. Section 2.4.4.1
Show microarray pseudoarray images menu describes these options and
shows some examples.
The Filter menu is used to select a set of data filters that determines which genes are selected. These are highlighted in the array image in different ways - with a red (white) circle in the intensity (ratio) pseudoarray image each spot meeting the range threshold criteria. How these are highlighted depends on which Plot menu Show Microarray method and View menu modes were selected. If the Show 'Edited Gene List' (EGL) option is set in the View menu, genes in the EGL will appear as magenta squares. The "Filter mode" is always present and shows genes meeting various Filter criteria (to be discussed). The user may interactively define a list of genes by clicking on them when the Click to add gene to edited gene list option is set in the Edit menu. Alternatively, you can click on a gene with the Control key pressed to add a gene to the EGL or with the Shift key pressed to delete a gene from the EGL.
In all of the pseudoarray images, the grids in the image are labeled
field#-GridLetter (e.g. 1-C, 2-B, etc). This allows them to be
clearly identified as the user scrolls over the image that is larger
than the visible computer window.
There is also a popup alert message window for bettering informing
users of conditions that prevent them from doing the operation they
requrested. You must press the Close button to pop-down the message,
although you may do press the SaveAs butto to save the message to a
file. For complex problems, some of the messages may suggest what you
need to do to correct the problem.
Hybridized samples are selected from a list of all of the sample
samples in the database. To make it easier to select a HP, they may
be selected from submenus by their developmental stage (if supported
by your particular database) or from a list of all samples in the
database located on the left side of the pseudoarray image. If a sample
has never been loaded during a session, it will be loaded when you
request it.
The last sample selected is called the current sample or
current HP. That is the sample that is displayed in the pseudoarray
image in the primary MAExplorer window when using display modes
requiring a single sample.
Figure 1.3 Data Filter Venn diagram. This illustrates some of
the logical, data range and statistical tests criteria available using
the MAExplorer data Filter paradigm. Note that multiple criteria
may be selected from each of these categories. The extreme case,
probably never used, could use all tests.
A first-approximation approach to data-mining might be to sequentially
constrain the data of interest to find some changes and then to report
on those changes. We have arranged these commonly performed first-pass
operations as submenu entries in the Analysis Menu. The submenus are:
Figure 1.4 Screen view of MAExplorer main window with Analysis
Menu. The menu structure of MAExplorer was designed to allow users
to quickly perform commonly used data-mining operations. Other menus
are used for modifying the data (File, Samples, Edit, and View menus)
or accessing on-line Help menu information in a separate Web browser
popup window. MAExplorer menus are similar to most Windows PC
applications where pull-down menu selections are used to invoke
operations. The current hybridized array sample is displayed as a
pseudocolor ratio image of median normalized spot intensities.
Clicking on a spot assigns it as the current gene with data being
reported in the top most message area. The names of the current HP-X
and HP-Y samples are listed above that area. In general, clicking on
spots, points in plots or cells in spreadsheet reports will assign the
it as the current gene and access Web genomic databases if enabled.
In addition to displaying the hybridized sample pseudoarray images,
derived data may be viewed in various types of plots. These include
scatter plots, histograms, ratio-histograms, expression profiles, gene
clustering, etc. Data may be presented as table reports presented as
either active spreadsheets that can access genomic databases by
clicking on cells or as tab-delimited Excel-compatible tables that may
be cut (if your windowing system supports this) and pasted into an
Excel spreadsheet.
The selected HP-X and HP-Y samples are used when generating scatter
plots, ratio histograms and other graphics. Scatter plots and ratio
histograms may also be performed on the left and right sides of the
currently displayed HP array (fields F1 and F2 respectively if array
data has duplicate spots for the same genes).
A MAExplorer database contains a table identifying genes, so data is
accessible by gene name as well or by sub-strings identifying a set of
genes (e.g. "onco" that could be used to find any oncogene or
proto-onco gene in the database).
When the program starts, it displays the microarray image of the first
hybridized sample in the HP-X set of samples initially specified. If
you specify a new HP-X or HP-Y sample, then it changes the pseudoarray
image to correspond to that array. You may change the current HP-X or
HP-Y sample from either the Samples pull-down menu or by clicking on a
sample in the Active Sample list in the left of the pseudoarray
image. If you click the mouse on or near a spot, it will latch
onto that spot and define it as the current gene.
Note: In Figure 1.4,
genes that pass the MAExplorer data Filters are indicated by red
(white) circles around spots in the pseudograyscale (pseudocolor)
intensity (ratio) image. The pseudoarray image shows the gene data as
replicate grids of spots if there are two fields Field 1 (left set of
grided spots) and Field 2 (right set of grided spots). If there is no
duplicate spot data, then only Field 1 is shown.
If background correction is enabled in the Normalization menu, then
intensity is reported in the message displays as intensity'
otherwise as intensity.
Normalization should also be used between hybridized samples -
whether the data is ratio data (i.e. Cy3/Cy5) or single sample
intensity arrays.
Setting up MAExplorer to work with user-specific data is
discussed later in this manual
Figure 1.5.1 The MicroArray Explorer home page at
http://maexplorer.sourceforge.net/. The table of contents in
the left panel lists an introduction and short tutorial, several
demonstration databases. Below that are links to documentation
including this reference manual, glossary and index. The Export
version discusses running MAExplorer with other arrays and as a
stand-alone version. The Download
application is a Web page for downloading and installing the
stand-alone Java application on your computer.
You may start MAExplorer in your Web browser from the MGAP
Startup DB. This offers several preset public databases consisting
of sets of hybridized samples as well as the empty database. After
you have clicked on a particular startup database, it will begin
loading MAExplorer - indicated by a red box with a
"Loading..." message in the top window of your browser. After
MAExplorer starts, this message changes to a white box with "Reading DB" while it downloads the data files
required. Finally, when it is ready for your interaction, it displays
a white box with a green "Ready".
NOTE: for Web browser invocation, the MAExplorer applet works with
Netscape 4.7, Internet Explorer 5.0, and HotJava on a Windows
(95/98/NT/2000/XP) system or a Solaris Unix system. Macintosh and SGI
systems seem to hang at times because of Web browser
problems. However, it works on all other systems as a stand-alone Java
application that you may download and
install on your computer. You might want to review these Web browser restrictions.
After the MAExplorer is started and the menus become active, you may
switch the preset hybridized samples to other samples using the
Samples pull-down menu. The last hybridized sample loaded
becomes the "current hybridized sample" and its image is the one
displayed.
The following Sections 2.1 through 2.7 describe the pull-down menus in
detail.
In stand-alone mode, the user may select the database subset to be
loaded from either a Web server or a local file system. When used as
an applet, this is pre-determined by the Web page where MAExplorer is
started. Opening a disk DB, 'Open disk DB', also restores any user defined gene sets
and other parts of the exploratory
state that were present when the 'Save ... disk DB' was
invoked.
In the following menus, selections that are sub-menus are
indicated by a ''. Selections prefaced with a '' and indicate '' indicate that the command is a checkbox
that is enabled and disabled respectively. Checkbox menu items
have a "[CB]" at the end of the command. Selections prefaced with
a '' and indicate '' indicate that the command is a
multiple choice "radio button" that is enabled and disabled
respectively, and that only one member of the group is allowed to be
on at a time. Radio button menu items have a "[RB]" at the end of
the command. Selections prefaced with a '#' indicate that
the commands are available only when MAExplorer is run in the
stand-alone mode. Selections prefaced with a '*' commands
requires access to the backend Web server [Future]. Selections that
are not currently available will be grayed out in the menus of the
running program.
When used as an applet connected to a Web database server, databases
may be divided into public and collaborator projects. Users accessing
protected collaborator projects will be required to log-in to the
server and a popup login request will appear.
[In the future], each user will be able to save the state of their
exploration into a password protected directory of named states on a
Web server (e.g. doing a 'Save ... Web DB' command. Later, they could
restore that state from the Web server by doing an 'Open Web DB'
command). Users would be required to register with that server to set
up a unique state-saving area. Once this facility was setup, users
may selectively allow other user's to view selected data implementing
a groupware environment for improving collaboration.
Figure 2.1.1 Example of the "Open file DB" command. The file
browser is opened in the current project directory with the name of
the currently opened file. You may select another .mae startup
database file to load in the current project. You may also "cruise"
the file system and load an .mae file from a different project directory.
The "Set project" command makes this easier since it gives you a list
of available projects that you may change directly. The projects must have
been setup on your computer previously. The "New project" command can
be used for setting up new projects or projects.
Figure 2.1.2 Example of saving a user session in a new startup file
using the "SaveAs DB" command. The file browser is opened in the
current project directory with the name of the currently opened
file. You may enter another .mae file name to save your current
session. Then when you restart MAExplorer using this new file, it will
restore the data mining state to where you left off (except that no
popup windows are opened).
A registered user may allow another registered user to access their
state or states (using the Open another user's state command)
if the user owning the data had granted them permission. The Share
user state and Unshare user state commands control these
permissions. There are two special share-users defined: public
to allow unlimited read-only access to the state they specify, and
private to disallow all access to a user state.
The first menu command, "Choose HP-X, HP-Y and HP-E samples", entries
lets you change the current working HP-X 'set', HP-Y 'set', and
HP-E 'list' hybridized samples.
The second menu command, "Choose named condition lists of samples",
lets you define or edit new named lists of hybridized samples. This is
useful for defining sets of replicate samples. These may be further
manipulated using the (Edit menu | Sets of Conditions (samples))
commands.
The third menu command, "Choose ordered lists of conditions", lets you
define or edit new named Ordered Condition Lists (OCL) of named
condition lists. This is useful for defining a sub-experiment
consisting of N conditions each with replicate samples. The last OCL
manipulated is defined as the "current OCL". The current OCL is used
in the OCL F-test Filter.
The fourth menu command, "Set Samples from lists", lets you change the
current HP-X and HP-Y, HP-Y samples as well as the HP-X 'set',
HP-Y 'set', and HP-E 'list' samples. This is similar to using the
"Choose HP-X, HP-Y and HP-E samples" command, but is more dificult to
use. You may change the current HP-X or HP-Y sample by clicking on
the sample name directly in the list of sample names on the left side
of the pseudoarray image (see Figure 2.2.3 legend).
The fifth menu entry, "Edit use (Cy5/Cy3) else (Cy3/Cy5) for each HP",
lets you swap data channels for Cy3/Cy5 data for individual samples.
Other menu commands list the status of the current HP-X 'set', HP-Y
'set', or HP-E 'list', and define condition class names that are
associated with the HP-X 'set' and HP-Y 'set'. The last menu entry,
"Use HP-X & HP-Y 'sets' else single samples", lets you switch
between using HP-X and HP-Y as single samples of sets of multiple
samples. For example, if you are using a scatter plot of X and Y, it
will switch the data being plotted from a comparison of single samples
to a comparison of means of sets of samples depending on the status of
the switch. Sets of samples are used extensively in data explorations.
Figure 2.2.1 Samples menu - selecting lists of samples by using the
"chooser". The hybridized samples assigned to the current HP-X,
current HP-Y, set of HP-X, set of HP-Y and expression profile list
HP-E may be changed from the Samples pull down menu using the
Choose HP-X, HP-Y and HP-E option lets you graphically change
the currently active sample HP-X, HP-Y sets and E-list.
Figure 2.2.2 Samples menu - selecting samples by source
characteristics. The hybridized samples assigned to the current
HP-X, current HP-Y, set of HP-X, set of HP-Y and expression profile
list HP-E may be changed from the Samples pull down menu. The
specific "By Source" menus shown here are from the MGAP database.
This figure shows the user changing the current X sample from the
developmental stages submenu that is part of the "By Source" submenu.
Alternatively, samples containing a keyword or part of a keyword can
be found using a "guesser" popup window that allows the use of wild
cards. This is invoked using the "From list of all H.P.s" submenu. For
example, you could specify "*pregnancy*" to find all samples of
containing that word.
Figure 2.2.3 Changing the current sample to either the HP-X or
HP-Y sample by clicking on a sample name at the left edge in the
microarray pseudoarray image. The current sample is indicated in
magenta. Click on the magenta "*" adjacent to the new name you want to
select and it will change the HP-X sample. To switch between setting
HP-X and HP-Y, click on the [X] Current
Sample box to change the sample to HP-Y. You can click on
[Y] Current Sample box to change it
back to HP-X. Then clicking on a sample name will set it to the
current HP-X or HP-Y that was selected. This figure shows that the
user had selected [Y] and C57B6-L10-29hrs for the new HP-Y sample.
The Set current HP-X sample and Set current HP-Y sample
commands offer another way to set the single current X and Y sample
(see Figure 2.2.3 above for
the preferred way using the "Chooser").
The Edit HP-X & HP-Y 'sets' of samples by source menu allows
the user to define HP-X and HP-Y as sets having multiple
hybridized samples. Then, the mean values of the genes are used
when comparing HP-X with HP-Y.
For example, the By Source database-specific entries for the
MGAP database includes the following submenus.
The From list of all samples selection pops up a hybridized
sample guesser dialog window. As with the gene name guesser, you can
start typing in the name of a sample and it will give you a list of HPs
that match that initial string. You then click on the sample you want
and then press the Done button.
Figure 2.2.4 Samples menu - selectively swapping (Cy3,Cy5) data
channels for particular samples. This is only operative if your
database contains Cy3/Cy5 ratio labeling data. This is useful in
databases containing subsets of dye-swap experiments mixed in with
other samples that are not dye-swapped.
Figure 2.2.5 shows a screen illustrating a popup condition chooser
session. The set of all samples in the database is in the
scrollable "Remainder Samples" window in the upper left. The samples
you have selected for the condition list being edited is shown in the
upper right "Selected Samples in current condition" window. The list
of all conditions in the database is in the lower left "List of
Conditions" window. The current condition list that is selected is
highlighted and its contents displayed in the "Selected Samples"
window. User defined annotation associated with the current condition
are displayed in the right "Current Conditioned Annotation" window.
To add a new condition, click on the Add Cond button to define
the new condition name. The Remove Cond button is used to
delete a named condition list. The List Cond button pops up a
report listing the samples and annotation for the current
condition. The List All button pops up a report listing the the
names of all of the conditions and the annotation names. You may add
or remove new annotation names for all of the conditions. The Add
Ann button will add the new annotation you enter into all
conditions - you must enter the data for each condition that requires
it. You may The Save the current status of all of the
conditions into your working database. If you have pressed
Cancel before saving, then you will not have saved your
edits. Pressing the Done button will save the changes and
pop-down the window.
Figure 2.2.6 shows a screen illustrating a popup ordered condition
list (OCL) chooser session. The set of all conditions in the
database is in the scrollable "Remainder Conditions" window in the
upper left. The conditions you have selected for the OCL being edited
is shown in the upper right "Selected Conditions in current OCL"
window. The list of all conditions in the database is in the lower
left "List of Conditions" window. The current OCL list that is
selected is highlighted and its contents displayed in the "Selected
Conditions" window. User defined annotation associated with the
current OCL are displayed in the right "Current OCL Annotation"
window. To add a new OCL, click on the Add OCL button to
define the new condition name. The Remove OCL button is used
to delete a named condition list. The List OCL button pops up a
report listing the conditions and annotation for the current OCL. The
List All button pops up a report listing the the names of all
of the OCLs and the annotation names. You may add or remove new
annotation names for all of the OCLs. The Add Ann button will
add the new annotation you enter into all conditions - you must enter
the data for each condition that requires it. You may The Save
the current status of all of the OCLs into your working database. If
you have pressed Cancel before saving, then you will not have
saved your edits. Pressing the Done button will save the
changes and pop-down the window.
Sets of genes or HP condition lists are very useful for tracking
complex data-mining sequences of analysis. For example, derived named
gene sets may be used in successive data filters and for reports. For
example, one could do the following experiment given four different
types of HPs for (e.g. virgin, pregnancy, lactation, and
involution)
The Edit menu contains the following main selections. All of these entities
and preferences are saved as part of the startup state when you
do a (File | Databases | SaveAs ... DB).
Figure 2.3.1 Edited Gene List defined from the Gene Name Guesser
using wildcards. The Edited Gene List was defined as the set of
genes containing the sub-string "onco" in it. The sub-string was
specified to the popup guesser window as "*onco*" using '*' characters
as wildcard symbols indicating that it should match any or no
characters. The button Gene Name may be toggled through a set
of other identifiers including Clone ID, UniGene ID, dbEST 3', dbEST
5', GenBank 3', and GenBank 5', LocusID, etc. depending on what
identifiers are available in your database. The user then pressed the
Set E.G.L. button on the guesser window that sets the E.G.L. to
those genes. If you have enabled the View menu "Show 'edited gene
list', then the genes in the EGL. are viewed as magenta squares seen
in the pseudoarray image. You many to do additional editing to
manually add or remove genes that you want to change in the set. If a
2D scatter plot was being used, EGL labeled genes would appear there
as well. To select a particular gene as the current gene, click on the
gene you want in the list, then press the Done button.
If you are running MAExplorer in stand-alone mode, the current named
gene sets are saved when you save the DB using the Save disk DB
or Save as disk DB selections in the Databases submenu of the
File menu. The gene sets are saved in a State sub directory as
".cbs" files and are used to restore the gene sets when restarting
MAExplorer on a .mae startup file. The .mae startup file saves the
names of the .cbs files that are shared among the various startup
files for a given project. The implication then is that if you change
and save a gene set in one startup database, it will change in other
startup databases when they load that gene set. The advantage is that
different startup databases may view a gene set produced by another
database.
The Sets of genes operations in the Edit menu include:
The following is an example of List saved gene sets state
listing the catalog of named gene subsets in some of the MGAP
data. Note that sets #1 to #11 are fixed by the data in the GIPO file
and may not be changed by the user. Sets #12 to #14 are assignable
from other sets or in the case of the E.G.L, by various MAExplorer
operations. Sets #1 through #14 may not be removed whereas #15 and
higher may be removed.
The following figure illustrates selecting sets by name for gene set
operations.
Figure 2.3.2 Selection of gene sets for binary gene set
operations. This example computes the Boolean AND of two sets "ALL
NAMED GENES" and "60 genes closest to CA-III from Named and Ests", and
then the AND of the "Replicates" with the previous result. The first
result is save in the set called "The 60 genes closest to Carbonic
Anhydrase-III". The second result is saved in the called set "Named
genes in the 60 genes closest to CA-III". Finally, the third result
is saved in the set named "Replicate genes in the 60 genes closes to
CA-III".
The following is an example of List saved HP condition lists
state listing the catalog of named HP condition lists.
The following is an example of List contents of saved HP condition
list state.
The Font Family submenu is used to set
the text font family. This may be useful if your computer is missing
some fonts or some fonts are easier to read than others. Note: some
fonts may not work well on your computer. If this is the case, try
another font. When you save the data mining session with the "SaveAs
file DB", it also saves the font you have set. For some plots or popup
text-windows, you may have to regenerate the popup window to see the
font changes.
Figure 2.3.4.1 Popup window allowing you to adjust all threshold
slider values">. The Adjust all Filter threshold scrollers
command allows you to pre-adjust all threshold slider values used in
data filtering and in clustering. It may be easier to set the
approximate range before invoking the clustering operation because
changing a parameter will recluster your data.
The Define HP-X (HP-Y) class name command may be used to change
the names of the HP-X (HP-Y) experimental condition sets. These names
are used in various labels in the main window, popup plots and
reports, etc. The commands to change various names of database
components are in the Preferences submenu in the Edit menu.
Figure 2.4 MAExplorer main window with Analysis Menu. The menu
structure of MAExplorer was designed to allow users to quickly perform
commonly used data-mining operations as a first approximation
analysis.
Figure 2.4.1.1 Gene Class menu. The user may select a subset of
genes that belong to one of the classes of genes. This shows the
user selecting the set of "All named genes" that are indicated with
red (white) circle over the spots in the array intensity (ratio)
pseudoarray image.
Figure 2.4.1.2 Example of all replicated genes occurring more than
once in the array. This was selected by using the GeneClass
'Replicate genes'. You may use the data Filter "Filter by genes with
replicates" instead of the GeneClass. This has the advantage that you
may use other GeneClasses (e.g. ESTs, or All named genes, etc.).
Alternatively, you can find all of the replicates for a particular
gene by 1) use the Gene Guesser to find the particular gene you want;
2) press "Set E.G.L." to save it as an Edited Gene List; 3) enable the
Filter "Filter by E.G.L." at the same. This will show all occurrences
of that gene.
The set of all genes constitutes a number of different gene
classes. It is possible to restrict the subsequent analysis to a
particular subset of these genes called a gene class. The
GeneClass menu operations include operations to select the
current set of genes to analyze from the set of all genes by their
membership in a gene class.
Some of the above gene classes are deduced from the gene name supplied
with the Gene In Plate Order (GIPO) file for the array. We use the
following automatic classification rules shown in Table 2.4.1.
Table 2.4.1 Rules for the automatic classification of gene names
into the default Gene Class sets. The gene name is analyzed
alphabetic-case independently.
Some software quantification software (e.g. Research Genetics'
Pathways 2.01) measures background globally as: BGLow (low
background), BGAvg (Average background), BGRms (root mean square
background). For MGAP, MAExplorer uses the BGLow value when you
request background subtraction. These values are read from the
MAExplorer Samples DB file (see Appendix
Table C.2.1.1 For other quantification programs, background may be
available on a per-spot basis in the quantification files. It the
latter is available in your data, it will be used if background
correction is enabled (see
Appendix C.3).
The background corrected intensity I'ij is
computed from the raw intensity Iij and background
intensity bkgrdHPi for H.P. i and spot j as
follows:
Figure 2.4.2.3 Scatter plot of HP-X and HP-Y 'sets' data. HP-X
is C57B6 pregnancy day 13 and HP-Y is Stat5a (-,-) pregnancy day 13
filtered by "All named genes and ESTs". A) A scatter plot
using the Median normalization. B) A scatter plot using the
Zscore of the logs normalization. Notice how the Casein alpha outlier
is more apparent in the case of the Zscore log normalization. The
skewed plot is characteristic of much microarray data. Some
normalization methods (not currently included in MAExplorer) can
compensate for these some of these artifacts (Dutoit, 2000) and are planned for
future MAEPlugins.
Figure 2.4.3 Filter menu. The Filter menu is a cascade of data
filters that restrict the set of genes passing all filters that
have been enabled and whatever the criteria was that was set for those
filters. This figure shows the GeneClass filter set to "All genes and
ESTs", the spot CV filter and Ratio (X/Y) range filters being set
interactively by the scroll bars on the right. The genes that pass
the filter are indicated with a red (white) circle in the array
intensity (ratio) pseudoarray image.
The Filter menu options are used to restrict the set of genes
by pre-filtering the data with a series of cascaded filter criteria
and tests. The resulting subset of genes passing the filter are then
used in the plots, reports and other data analysis methods. Some of
the filters require additional parameters that are set by the State
scrollers. The user will automatically be prompted for changes to
these scollers (a threshold scrollers window will pop up) when the
filter is activated or change. These values may also be set from the
Adjust all Filter threshold scrollers entry in the
Preferences submenu in the Edit menu. The filters are
broken up into subgroups in the following menu with the grouping
haveing more to do with the criteria (i.e. gene set membership, data
range, or statistical tests).
The Filter by positive intensity
data submenu filter contains options that specify which spot
intensity values are to be considered when excluding negative
quantified spot data. Note: this filter only makes sense if your data
might have negative values (e.g. Affymetrix chip "Avg Diff" data) or a
background corrected value that is less than 0.0. The filter is
enabled by setting the "Filter by spots with positive intensity"
checkbox. Negative intensity values may occur with some types of
arrays quantification programs. In the "Check spots for positive
values mode" submenu, you may set the samples where the test may be
applied to spots from the current HP, the single (HP-X,HP-Y) samples,
(HP-X,HP-Y) 'sets' (replicated spots), or samples in the HP-E list
selected to be used in the filter. If there are (F1,F2) or (Cy3/Cy5)
data, then each spot must meet the threshold criteria.
The Filter by Good Spot data submenu filter contains options
that specify spots based on their quality. It filters out genes that
have that do not have "Good Spot" values defined by the optional
QualCheck spot data. (See the list of codes in Appendix C.4). If there is no
such spot quality data, then all spots are considered "good". The
filter is enabled by setting the "Filter by spots with Good Spot
values" checkbox. All spots for the specified samples must meet the
criteria. In the "Check spots for Good Spot mode" submenu, you may set
the samples where the test may be applied to spots from the current
HP, the single (HP-X,HP-Y) samples, (HP-X,HP-Y) 'sets' (replicated
spots), or samples in the HP-E list selected to be used in the filter.
The Filter by Spot Detection Value data submenu filter
contains options that specify spots based on their spot detection
value quality metric over the range of [0.0 : 1.0]. The filter is
available only if the data exists for your database and is ignored
otherwise. If active, it pops up a "Spot Detection Value" slider in
the range of [0.0 : 1.0]. Only spots greater than the slider value
pass the filter. This data could be the Affymetrix MAS5.0 "Detection
p-value" or some other metric correlated with spot detection quality.
The filter is enabled by setting the "Filter by per-sample Spot
Detection Value" checkbox. All spots for the specified samples must
meet the criteria. In the "Check spots for Spot Detection Value mode"
submenu, you may set the samples where the test may be applied to
spots from the current HP, the single (HP-X,HP-Y) samples, (HP-X,HP-Y)
'sets' (replicated spots), or samples in the HP-E list selected to be
used in the filter.
The Filter by spot intensity [SI1:SI2] sliders submenu contains
options that determines how individual spot intensity thresholding is
to be applied in the Filter.
The Filter by [I1:I2] sliders submenu contains options that
determines how spot expression (intensity or (Cy3/Cy5) ratio value)
thresholding is to be applied in the Filter:
The Filter by ratio or Zdiff sliders submenu contains options
that determines how spot-ratio thresholding is to be applied in the
Filter. The spot ratio is mean HP-X / mean HP-Y for sets of
samples. The spot Zdiff is used if one of the Zscore normalization
methods is active and is computed as (mean HP-X - mean HP-Y) for sets
of samples.
The Filter by Cy3/Cy5 HP-X ratio or Zdiff sliders submenu
contains options that determines how spot Cy3/Cy5 HP-X ratio
thresholding is to be applied in the Filter. The spot ratio is Cy3/Cy5
for normalized data unless one of the Zscore methods is used. In that
case, the Zdiff is used and is computed as (Cy3 - Cy5) for sets of
samples. If HP-X 'sets' is used, then it computes the mean Cy3 value
and the mean Cy5 value and uses those values in the above
computations.
The Filter by spot CV submenu filter contains options that
specify how the Coefficient Of Variation of the (F1,F2) or (HP-X,HP-Y)
'sets' (replicated spots) is to be used in the filter. The (F1,F2) CV
is available only if there are duplicate spots on the HPs.
Figure 2.4.3.1 Filtering using multiple scrollers. This example
is of Cy3/Cy5 time series data. It filters normalized spot intensity
of the Cy3 and Cy5 channels independently ([SI1:SI2] inside range)
where low intensity spots are eliminated. It then filters out genes
outside of the [R1:R2] ratio range.
Figure 2.4.3.2 Using the Positive Intensity data Filter.
This allows removing negative data if the data contains negative
intensity values (e.g. Some Affymetrix data has negative Average Difference
values which could be read as Intensity for MAExplorer).
You may switch between different representations of the microarray
spot pseudoarray image. It may be viewed as several different types of
pseudo images including an intensity gray value and a pseudo-color
Red/Black/Green image for ratio (HP-X/HP-Y) and Zscore (HP-X - HP-Y)
data. The p-Value results of comparing a HP-X 'set' with a HP-Y 'set'
of samples, or the CV of the HP-EP 'list can be displayed as a color
spectrum pseudoarray image.
Depending on the origin of the array data, it may have the same
verisimilitude as the original arrays. Otherwise, it is displayed in
a generic pseudoarray image containing grids that will fit the window
- these are not the same as the original array image (see . However, the
pseudoarrays are useful to getting a rough idea of the global changes
in the data between arrays and how may genes pass the data filter.
When enabled using one of the commands in the Section 2.4.5 Clustering menu, cluster data appears as
blue circles or squares drawn as overlays on
the pseudoarray image. These options are discussed in the section on
clustering. If you are doing clustering K-means clustering, the
current cluster is displayed in the scatter plot if the latter is
active.
Scatter plots, ratio and intensity histograms of the mean (HP-X/HP-Y)
or (HP-X/HP-Y) 'set' data, or the F1/F2 or Cy3/Cy5 data. F1/F2
or Cy3/Cy5 plots are available if the data exists in your particular
database. That might be the case with replicate spots or with Cy3/Cy5
data. If the normalization is set to a Zscore or log mean mode, it
will compute Zscore scatter plots and histograms.
Clicking on spots in an array image or points in scatter plots sets
the current gene and will bring up data on the gene or (optionally)
access corresponding data from GenBank, UniGene, mAdb Clone, etc. databases in a
popup Web browser. Clicking on a bin in a ratio or intensity
histogram plot filters out all genes except for those in the range of
that bin.
Expression profiles plots of selected genes or subsets of genes for
all samples in the HP-E list. These are active plots with data reported
when the user clicks in the plot.
Clicking on a spot (i.e. gene) in the microarray pseudo image or on a
point (i.e. gene) in the scatter plot, it will define that gene as the
"current gene" that is used in other operations. The current gene is
indicated in both plots with a green circle
around it. Similarly, you may modify
the 'Edited Gene List' from either the pseudoarray image or the
scatter plot. When viewing is enabled, it overlays those genes with
magenta squares.
Figure 2.4.4 Plot menu - selecting Ratio Pseudoarray
image. This displays a pseudocolor show in the scale on the left that
indicates the ratio of the value of the HP-X sample / HP-Y sample (or
'sets' if the option to use HP-X and HP-Y 'sets' is enabled.) If The
data is Cy3/Cy5 data, then this displays the ratio of the ratios using
the current normalization. Various other pseudoarray image
representations could be used.
If the database that was loaded contains only one sample, the
pseudoarray image display defaults to the pseudograyscale spot
intensity mode. If there is at least one HP-X and one HP-Y sample,
then the Pseudocolor HP-X/Y ratio or Zdiff mode is the initial
default display. If there are duplicate spots for each gene, you may
generate a Pseudocolor F1F2 ratio or Zdiff mode image. If you
are using Cy3/Cy5 ratio data and the data is available as independent
channels for each HP, then you may plot Cy3 vs Cy5 for individual
samples.
When available on the database server, the original image may be
displayed in a separate popup Web browser.
Table 2.4.4.1. Pseudocolors assigned to spots to represent data in
the X/Y ratios or X-Y Zdiffs pseudocolor array images. Each color
represents the normalized X/Y ratio or X-Y Zdiff depending on
Normalization mode. The 9 colors of the boxes represent the normalized
expression ranges.
The same data is shown in a variety of normalization and display formats.
Figure 2.4.4.1.1.1 Pseudoarray intensity image of median normalized
intensities of the current HP sample (C57B6 virgin 10 weeks from MGAP
database). The graylevel scale on the left edge of the pseudoarray
image indicates the spot intensity. All pseudoarray images have scales
that vary depending on the type of pseudoarray being displayed.
Figure 2.4.4.1.1.2 Pseudoarray intensity image of Zscore normalized
intensities of the current HP (C57B6 virgin 10 weeks from MGAP
database).
Figure 2.4.4.1.1.3 Pseudoarray intensity image of ZscoreLog normalized
intensities of the current HP (C57B6 virgin 10 weeks from MGAP database).
Figure 2.4.4.1.1.4 Pseudoarray intensity image of ZscoreLog
normalized intensities of the dual HP-X and HY-Y individual
samples. The Plot menu Show Microarray submenu toggle "Use dual
HP-X & HP-Y samples" option is set. HP-X is a C57B6 pregnancy day
13 and HP-Y is a Stat5a (-,-) pregnancy day 13.
Figure 2.4.4.1.1.5 Pseudoarray intensity image of ZscoreLog
normalized intensities of the dual HP-X and HY-Y sample 'sets'. The
Plot menu Show Microarray submenu toggle "Use dual HP-X & HP-Y
samples" option is set. The "Use HP-X & HP-Y 'sets' option in the
Samples menu. HP-X is the mean of three 'C57B6 pregnancy day 13' and
HP-Y is the mean of three 'Stat5a (-,-) pregnancy day 13'.
Figure 2.4.4.1.2.1 Pseudocolor array image of median normalized X/Y
ratios. HP-X is C57B6 pregnancy day 13 and HP-Y is Stat5a (-,-)
pregnancy day 13. Each spot's color represents the normalized X/Y
ratio depending on Normalization mode. The color of the box is one of
9 colors representing the normalized expression ranges and assigned
according to the table "Ratio
mode".
Figure 2.4.4.1.2.2 Pseudoarray color image of normalized X/Y 'set'
mean value ratios. Mean of three HP-X C57B5 pregnancy day 13
samples and mean of three HP-Y Stat5a (-,-) pregnancy day 13 samples.
Each spot's color represents the normalized X/Y 'set' ratios depending
on Normalization mode. The color of the box is one of 9 colors
representing the normalized expression ranges and assigned according
to the table "Ratio mode".
Figure 2.4.4.1.2.3 Pseudoarray color image of X-Y Zdiffs. HP-X
is C57B6 pregnancy day 13 and HP-Y is Stat5a (-,-) pregnancy day 13.
Each spot's color represents the normalized X-Y Zdiff depending using
the Zdiff normalization mode. The color of the box is one of 9 colors
representing the normalized expression ranges and assigned according
to the table "Zdiff mode".
Figure 2.4.4.1.2.4 Pseudoarray color image of X-Y Zdiff of log
data. HP-X C57B5 pregnancy day 13 sample and HP-Y Stat5a (-,-)
pregnancy day 13 sample. Each spot's color represents the normalized
X/Y ratio depending on ZdiffLog with StdDev normalization mode. The
color of the box is one of 9 colors representing the normalized
expression ranges and assigned according to the table "ZdiffLog mode".
Figure 2.4.4.1.2.5 Pseudoarray image showing color-coded p-values
for t-test comparison of HP-X and HP-Y 'set' samples. The HP-X and
HP-Y sets both have 2 samples each (more is obviously much better).
The data was normalized using the Median and a spot intensity
[SI1:SI2] data filter was applied to eliminate some of the noisy data.
Each spot's color represents a p-value in the range indicated in the
scale in the left edge of the image. Note that although all spots are
assigned a p-Value, many may not be very significant because adequate
preprocessing of the data (such as normalization, and low intensity
spot removal, etc.). So use this display with care.
Figure 2.4.4.2 Scatter plot of HP-X and HP-Y single sample
data. HP-X is C57B6 pregnancy day 13 and HP-Y is pactation day 1.
A) An active scatter plot may be generated for the current HP-X
and HP-Y samples filtered by "All named genes". B) similar
plot for HP-X and HP-Y 'sets' of replicate samples (3 pregnancy and 4
lactation samples in the sets respectively). Clicking on a point in
the plot sets the current gene. C) Zoomed up region (of
B) at the bottom of the plot showing more detail and filtered
by just "All named genes". Zooming is performed by adjusting the X or
Y axes limits scroll bars. Note the points enclosed in magenta boxes
indicate genes in the E.G.L. gene list.
Figure 2.4.4.2.1 Scatter plot of multiple channel data from a
single sample. A) F1 Vs F2 data for a C57B6 pregnancy day 13
sample. B) Cy3 vs Cy5 data for a NCI mAdb mouse array sample.
C) Scatter plot of individual Cy3 channels from two different
ratio Cy3/Cy5 data hybridized samples. C) Scatter plot of
individual Cy3 channel of HP-X compared with Cy3 channel of HP-Y for
ratio Cy3/Cy5 data hybridized samples. D) Scatter plot of
individual Cy3 channel of HP-X compared with Cy5 channel of HP-Y for
ratio Cy3/Cy5 data hybridized samples.
The Intensity selection plots a histogram of the gene intensity
data values for each Filtered spot (gene) in the current hybridized
sample.
The Histograms submenu includes:
If Cy3/Cy5 ratio data is being analyzed, then the F1F2 histogram menu
entry becomes
Figure 2.4.4.3 Histogram plots. A) Ratio histogram of
HP-X/HP-Y data with particular histogram bin selected with the
constraint set to filter all genes > that bin. HP-X is 13 day
pregnancy C57B6 and HP-Y is day 1 lactatation. The selected bin
thresholds are then used in the Filter with the resulting Filtered
genes shown in the array image. B) Zdiff histogram of HP-X -
HP-Y 'sets' for same data as (A) but with the ><
threshold constraint set to find genes outside of the symmetric
histogram range. C) Intensity histogram of HP-X data filtered
by [I1:I2] intensity range. As with ratio histograms, you can do
additional filtering by selecting a particular histogram bin that is
then used in the Filter. Filtering was disabled for the intensity
histogram. To apply the filter, the "Don't re-Filter" button would be
toggled to the "Re-Filter" state. The threshold constraints include:
=, >, <, >, <>, and ><. Note that each time
you click on the "Thr:" button, it cycles to the next option in the
threshold constraints list.
You many generate as many individual expression profile plots as you
want using the Display a gene's expr. profile for HP-E
command. However, only the last one will be active and will be updated
with different genes as you click on them in the microarray image
scatter plot. This could be used to compare the EP plots for several
different genes. First view the EP plot for one gene, then create a
new EP plot for the second gene, etc.
If you use the Display Filtered genes expr. profiles
for HP-E command, it will generate a scrollable list of
expression profile plots for all of the genes passing the Filter. If
the number of genes is very large, it may take a while.
You may interrogate a line corresponding to a particular HP sample in
a EP plot by moving the mouse over the line and then selecting the
line. This will cause the name of the HP, its intensity and CV to
appear in the plot. If the Err check box is set, then the mean
of the intensity is indicated by a short horizontal bar and the +- CV
by red vertical error bars above and below the mean. If the plot
style Line button is pressed, then the plot style is cycled
between Line (vertical lines for each point), Circle (small circles at
each point), and Curve (circles are connected). Pressing the button
repeatedly cycles through: Line (i.e. vertical vars),
Circle, or Curve (i.e. continuous curve of all
samples). In the case of mean expression profiles
used in K-means clustering, the standard deviation is used in
place of the CV value. The various clustering methods have EP
plots buttons. When they are invoked, the scrollable list of EP
plots is sorted by the clustering method ordered list of genes. This
enables you to view the data in the same order as that produced by the
cluster analysis. If the zoom nnX button is pressed,
then all of the plots are magnified by nn-fold to make low intensity
plots more visible. Pressing the button repeatedly cycles through:
1X, 2X, 5X, 10X and 20X. It does not change the data itself. The
Show HP names button pops up a numbered list of all HP entries
used in the expression profile. If you are in stand-alone mode, a
SaveAs GIF button will also be available for the EP overlay
mode (Figure 2.4.4.4.1) or individual EP plot. This saves the current
plot as a full resolution GIF file specified by the user in a popup
file browser window.
The Expression profile plots submenu contains:
Figure 2.4.4.4 Expression profile plots. A) Individual
expression profile plots may be created by clicking on any
gene. Multiple instances may also be created. Here we show some of the
presentation options for the 38 sample MGAP database. Error bars are
computed for the standard error for that sample. There are three
different plotting options: line, circle and curve. #1 is the default
line plot with error bars. #2 is the line plot without the error bars
but clicking on line 7 to find out which sample it is and what the
intensity value is. #3 is the circle plot with error bars, and #4 is
the curve plot without error bars. Window #5 shows the list of samples
corresponding to the 38 points in the EP plots. B) List of
EPplots of the oncogenes and proto-oncogenes in the database (set by
the guesser with "onco" and "Set E.G.L." and the Edited Gene List
Filter). The list would become scrollable if there were more than 10
profiles. Setting the current gene would scroll the list to the EPplot
for the current gene.
Figure 2.4.4.4.1 Expression profile plots.
A) Scrollable list of EP plots of Filtered named genes centered at
Carbonic anhydrase III.
B) Overlay plot of all named Filtered genes.
C) Overlay plot of all ONCO or PROTO-ONCO genes with the
draw EGL option active so the graphs are drawn for these genes.
When enabled, cluster data appears as blue circles
or squares drawn as overlays on the pseudoarray image. These
options are discussed in the section on clustering.
Cluster analysis plots include finding a subset of genes or subsets of
samples based on cluster analysis of expression profile similarity
measures. These show genes belonging to particular clusters, or genes
that cluster well with specified genes. Cluster methods include:
finding genes similar to the current selected gene within a "distance"
threshold; K-means-like clustering where you specify a seed gene and
the number of clusters; and hierarchical clustering with clustergram
and dendrogram graphics.
Figure 2.4.5 Cluster Menu options. The hierarchical clustering
option is being selected.
There are many methods for doing clustering - each with advantages and
disadvantages. We present three methods in MAExplorer and plan on
adding a variety of more powerful methods through the MAEPlugin
facility under development.
These methods may find genes belonging to particular clusters or genes
that cluster well with particular genes. Gene clusters are sets of
genes whose expression profiles are found to be similar according to a
particular metric. We now define what we mean by "similar". The order
list of hybridized samples used in computing the expression profiles
are those in the HP-E list. MAExplorer has two different dissimilarity
measures for Cij: Euclidean distance LSQdistij and Pearson correlation
coefficient rij. These are computed as
follows and are tested against the cluster distance threshold (set by
the slider in the preferences sliders). Let n= |HP-E|, the number of
samples in the expression profile. We define similarity as (1.0 -
normalized dissimilarity).
The Cluster plots submenu contains a number of clustering
methods. Pressing the Escape key during a long cluster operation will
abort the operation. If you are in stand-alone mode using the
ClusterGram, a SaveAs GIF button will also be available for
saving the current plot as a full resolution GIF file specified by the
user in a popup file browser window.:
The Hierarchical Cluster plots submenu contains:
Figure 2.4.5.1 Similar genes clustered to the current gene.
This method finds all genes that are similar to the current gene as
those defined by their distance between expression profiles being less
than the threshold set by the user. Each gene that passes the cluster
distance threshold test is indicated in the image with a blue square where the size of the square is
proportional to its similarity. This data is from the 38 samples in
the MGAP database containing duplicated spots. A) Main windows
with popup cluster similarity report and cluster distance threshold
slider. B) Scrollable list of EPplots of similar genes with the
red error bars indicating the variation for duplicated spots for each
HP sample. The Err checkbox may turn the error bar overlays on
and off.
For both of these commands, if you want to view the expression profile
plots, click on the EP plot button in the cluster window and it
pops up the scrollable expression profiles window. If you click on a
gene in the image, it will select it as the new current gene and seed
gene and recompute the cluster of genes most similar to the new see
gene.
For both of these commands, if you want a permanent report, click on
the "Cluster Report" button in the cluster window and it will generate
a report in the current modality (i.e. scrollable spreadsheet or
tab-delimited). You may switch between these two modes by pressing
the "Go '...'" button in the report.
Figure 2.4.5.2 Display of cluster counts for all genes less than
the cluster threshold from MGAP 38 sample database. The algorithm
counts the number of similar genes for each Filtered gene and draws
a blue circle whose size is proportional to
the number of genes similar to that gene. That is why there are a larger
number of the larger circles.
Figure 2.4.5.3 Genes clustered using the K-means cluster
method. A) Using the current gene as the initial cluster,
MAExplorer finds N orthogonal clusters assigning the set of filtered
genes to these clusters using the HP-E expression profiles. All
genes are iteratively assigned to these clusters. Genes belonging to
the current cluster are labeled with a green cluster number both in
the array and in the scatter plot. The slider determines the number of
clusters (set to 6 here). A 2D scatter plot shows the genes belonging
to cluster 6. The K-means cluster report on the right contains a sorted
list of the genes in each cluster and has buttons to generate EP
plots and reports as well as summary mean EP plots (shown) and mean
cluster reports. The detailed list is shown below. B) Part of
the scrollable EP plots for this data showing genes belonging to both
clusters #5 and #6. C) The mean EP plots for the 6 clusters.
We call the genes closest to the "center" of the K clusters primary
genes and they are reported with additional information. The "Cluster
[# genes]" entries in the distance-to-cluster fields indicates that
these genes are the center of the clusters (i.e. primary genes). The
distNext is the distance from this cluster center to the next nearest
K-means cluster center. The number of clusters N (6 in this example)
is set in the popup state scroller. If you change the value of N, it
will recompute the clusters and the primary genes.
It draws magenta circles around the
primary genes in the microarray and the cluster number to the right of
the circle. The size of a circle corresponds to the number of genes
clustered with that circle. If you click on a gene belonging to any
cluster, it defines that cluster as the "current cluster". It will
change the labels of the subset of genes that belong to the current
gene from red (white) circle to a green (yellow) cluster number of the
current cluster in the intensity (ratio) pseudoarray image. In addition,
the 'edited gene list' is set to the subset of genes that belong to
the current cluster. If you are also displaying a scatter plot, genes
in the current cluster have their red '+' characters changed to the
cluster number.
You can click on that gene in the array image to determine its
identity. You may also popup an ordered (same as the above report)
plot of the clusters expression profiles by clicking on the EP
plot button. You may plot the mean expression profiles of the N
clusters using the Mean EP plot button. You may generate a
report of all of the clustered genes or of the mean clusters using the
Cluster-Report or Mn-Cluster-Report buttons
respectively. If you change the Filter conditions, you may recompute
the clusters using the Recompute Clusters button. Closing the
text window will remove the magenta
circles. If you selected the current cluster, the genes that
belong to it will still be available in the 'edited gene list' for
making reports, saving as a gene subset or for additional gene
filtering. If you press the SaveAs GeneSets button, then K gene
sets are created with the names "Cluster#1", "Cluster#2", ...,
"Cluster#K". You can then save or rename the clusters you want and
delete the rest. If you press the ClusterGram button, it
displays the gene sets in a cluster gram order the same way as
the cluster report.
Clustering is represented by a binary tree and is visualized as an
ordered gene clustergram and optional dendrogram sub-plot. This is
similar to the methods of (DeRisi,
1996), (Eisen, 1998), and
(White, 1999). Currently,
MAExplorer does 1-way clustering - not the 2-way clustering of (Weinstein, 1998) and (Eisen, 1998). Each row of the
clustergram represents a gene and each column represents a HP in the
HP-E list of samples. Each box in a row represents the normalized
expression of that gene for the HP represented in that column. The
color of the box is one of 9 colors representing the normalized
expression ranges and assigned according to the following table:
Table 2.4.5.4. ClusterGram pseudocolor assignments. The
colors are assigned to "box" entries in the clustergram corresponding
to genes. The color represents data as either the X/Y ratio or X-Y
Zdiff relative to the normalizing HP.
Figure 2.4.5.4 Hierarchical clustering clustergram of genes
filtered by ratio histogram bins for 19 samples from the MGAP data
set. The hybridized samples are drawn as colored boxes in the 19
columns. Rows of boxes correspond to gene expression profiles. In
A), the set of all genes and ESTs was filtered by the CV filter
set to 0.387 and the normalization was the Zscore. The gene "Mus
musculus D9 spice variant 2 mRNA, complete cds" was selected as the
current gene in the clustergram. Data for this gene and the selected
HP column is indicated at the top of the clustergram. The list of the
19 samples is shown on the left. B) Details of clustergram and
dendrogram are shown where the user had selected a cluster distance
threshold at "Mouse mRNA for mitochondrial cytochrome c oxidase
subunit Vb" in the dendrogram part of the plot (zoomed by 2X). This
selection draws all parts of the dendrogram tree that are less than
this distance are drawn in red. C) shows the manual selection
of genes from the ClusterGram or Dendrogram by clicking on the genes
names you wish to capture in the Edited Gene List (EGL) while the
Control key is pressed. The zoomed subregion shows three genes in the
same cluster that were selected (magenta stars in the right edge of
the ClusterGram).
Figure 2.4.6 Reports menu. You may create either dynamic or
tab-delimited text reports of either Samples or of subsets of genes.
These may be presented as interactive dynamic tables as well as
scrollable text windows capable of being exported to Excel. If Web DB
access is enabled, clicking on an entry will bring up a Web browser
with access to GenBank data. If the report contains Clone ID as one
of the fields, you can click on it to have it define that gene as the
current gene and highlight it in the microarray image or scatter plot
(if it is being used). The reports are divided into two types - those
dealing with lists of arrays (i.e. the sample experimental condition)
and those dealing with lists of genes.
The Report menu includes:
The "Samples vs Samples correlation coefficients" computes the correlation
coefficients in an upper diagonal matrix for the current set of
Filtered genes showing HP samples similarity. Then entries are of the
following form where HP:1 and HP:2 correspond to samples listed in the
field names of the table and the data is the intensity values using
the current normalization method.
The "Calibration DNA summary" table contains the computed means,
std-dev, and computed normalization scale factor for all active
hybridized samples. The scale factors are used if the 'Calibration
DNA' normalization is used.
You must set the Web access checkbox if you want to click on a blue
hyperlink in the resulting report to access an associated Web
database.
Figure 2.4.6.1 Hybridized samples dynamic Report windows. A)
Samples Info report. B) Sample Web links. Clicking on a blue
hypertext link brings up the corresponding genomic Web database entry
in a separate Web browser window if the Web access is enabled. The
tab-delimited version of the same reports (not shown) may be cut and
then pasted into other programs such as an Excel spreadsheet.
C) HP vs HP correlation table on genes passing the data
Filter for all samples in the HP-=E list.
If Cy3/Cy5 ratio data is being analyzed, then the Highest (Lowest)
F1/F2 entries become
Figure 2.4.6.2 Gene Report windows of 50 named genes with highest
HP-X/HP-Y 'set' ratios. A) Dynamic gene report of 50
genes with highest HP-X/HP-Y 'set' ratios. A similar report may be
generated for the lowest ratios or for single HP-X/HP-Y samples. This
type of report may be generated for the highest or lowest Zdiff values
when the Zscore normalizations are used. Clicking on a blue hypertext
link brings up the corresponding genomic Web database entry in a
separate Web browser window if the Web access is enabled. It also sets
the current gene to the gene for that row. B) The
tab-delimited version of the same report may be cut and then pasted
into other programs such as an Excel spreadsheet.
Figure 2.5 View Menu options. These are divided into various
options for modifying the presentation as well as recording activity
such as the messages or history popup scrollable log windows.
Figure 2.5. Popup genomic browser database page. A) The
UniGene Web page pops up in a new Web browser window when the user
clicks on a gene in the array image, 2D scatter plot or Report and the
view is set to "Display current gene in Unigene Web Browser" toggle
was enabled in the View menu. The current gene was "Jun-B oncogene".
Alternatively, the B) mAdb Gene DB may be selected - as well as
GenBank or dbEST genomic databases. C) Alternatively, data from
the NCBI LocusLink database may be accessed if either the GenBank ID or
LocusID is available.
Figure 2.5.2 Examples of messages and command history popup log
windows. Measurements and other activity are shown in more detail
in the messages window whereas the command history indicates commands
(numbered in the order they are executed) in the command history window.
Data from either of these windows may be saved in text log files.
Figure 2.6 MAEPlugins paradigm. If you have a MAEPlugin .jar
file, then it may be specified using the "Load plugin" command. When
you invoke the command from the menus (or other methods), it accesses
data from the current MAExplorer database it may need from the Open
Java API.
The Save RLO reports in time-stamped Report/ folder [CB]
options puts files generated by R from successive executions of the
same RLO into separate sub-folders in the Report/ folder with names
"RLOname-YYMMDD-HHMMSS/" to peep the data separate. This is
useful when you want to compare results from the same RLO method but
with different MAExplorer preprocessing.
You may download the latest versions of all plugins using the (File |
Update Plugins from
maexplorer.sourceforge.net) menu command. Similarly, you can
update your versions of the RLO methods using (File |
Update RLO methods from maexplorer.sourceforge.net
Figure 2.6.1 Loading a MAEPlugin from your file system using
the Load Plugins command in the Plugins pull down menu. If you
have a plugin .jar or .class file, it may be specified using the "Load
plugin" command. This pops up a file browser to let you specify the
plugin file.
Figure 2.6.2 Executing the new command previously loaded in the
Plugin menu. Selecting the new "Show List Active Filters" command
that now appears in the Plugins menu invokes the plugin. This pops up
a report shown in the next figure.
Figure 2.6.3 Popup window from executing the MAEPlugin.
This plugin gives a full report on the data Filter status in a new
pop up window.
Figure 2.6.2 Plugins menu - executing a previously loaded
plugin. Plugins that do not go into particular MAExplorer
submenus go into the Plugins menu. Selecting the command will
invoke that MAEPlugin.
The Help menu includes:
This section briefly addresses some of the issues you need to
consider. However, a full discussion of the issues involved is beyond
the scope of this manual. These issues are covered in other more
focused statistical methods literature and you might also address them
in consultation with biostatisticians. The Internet has vast resources
for microarrays. A few to get you started might include: a microarray
citation electronic library
http://arrayit.com/e-library/, the National Library of Medicine PubMed
journal search engine, a general microarray Listserv
GENE-ARRAYS@ITSSRV1.UCSF.EDU. The MGED group (Brazma, 2001) has published the MIAME
standard which specifies (Minimum Information About a Microarray
Experiment). This information is useful in doing an analysis. Also
try searching using general Internet search engines. There are a
number of public microarray data repositories. One that we find useful
is NCBI's GEO (Gene
Expression Omnibus), that contains array data and MIAME compliant
information about the arrays.
A good and appropriate experimental design (i.e. the design and
setting up of experiments to subsequently be analyzed) is critical for
resolving significant differences in gene expression between
experimental conditions. We touch on some of the issues here. (Simon, 2001), (Dudoit,2000), and Kerr and
Churchill (2001a, 2001b) discuss some of the issues
of experimental design for microarrays. We do not currently implement
the Kerr-Churchill method. However, some of the issues involved in experimental design based on the types
of arrays are discussed in Section 3.1.1 for (Cy3/Cy5)-labeled as well
as 33P-labeled samples.
If users are comparing two different types of samples, the analysis
would be different than if they were comparing an ordered sequence of
samples (e.g. time series, cell cycle, dose-response, tumor-stage,
etc.). MAExplorer gives users the ability to:
Briefly, data mining is the discovery of potentially interesting
patterns in the data that were previously unknown. One approaches the
analysis of a set of data with minimal expectations. However, some
idea of what you are interested in helps focus the search. But beware
of the trap of mining the data until you get the results you hope
for. The following figure helps illustrate this process.
Figure 3.1 Flow chart of a typical data mining session. The
user makes some initial decisions on the experimental design such as
which hybridized samples to compare, the type and numbers of
replicates. They then make initial guesses as to the normalization
method to use, and the gene subset (the gene class) to concentrate on
when setting the data filter. The data is viewed in various modalities
to get a feeling for its inherent dynamic range and where interesting
outliers might appear. Clustering and plots helps bring these
differences into view. The results are then evaluated and either the
process is finished or the views are refined by adjusting data
normalization and filter parameters, data subsets to be investigated,
clustering methods, plots etc. and the process repeated until the user
is able to see the differences between gene subsets more clearly or no
significant differences appear to be found.
Obviously, this approach is a first approximation to what is
eventually required. But it does capture the flavor of the data-mining
process. Typically the user would refine the search using variations
of the data filters and might contrast (using gene sets and hybridized
sample condition lists operations) results found under one set of
conditions with those found under another set of conditions.
Proper experimental design of microarray experiments is critical to
successful use of microarray data. Several recent reports discuss some
of the key issues involved in various aspects of statistical analysis
of microarrays: (Radmacher,
2001), (McShane, 2001),
(Korn, 2001), (Simon, 2001), (Dudoit,2000).
An alternative method would be to compute
(Cy3X/Cy5Y) directly. However, this too
has its own sources of error and other problems, namely that not all
genes are labeled symmetrically with the two dyes since different dyes
may have different sequence specific affinities due to a variety of
causes. For that reason, dye-swap experiments are often
done. I.e. the two samples would be run as
(Cy3X/Cy5Y) as well as
(Cy3Y/Cy5X). If one were to plot
(Cy3X/Cy5Y) against
1.0/(Cy3Y/Cy5X) and the data were
perfectly symmetric (which they are not) then one would expect
a straight line. That is generally not what you get in practice.
Another issue is that when you have a number of samples A, B, C, D,
..., N and wish to compare them, there are a number of alternate
experimental designs you can use with different resulting sets of
advantages and problems. If a common pooled Cy5P
sample P were used, then the following experiments would be done:
MAExplorer is currently not oriented to handling these large
combinatoric types of non-pooled sets of experiments. However, you do
have the ability to swap (Cy3,Cy5) data on an individual basis so you
could compute an average of data from dye-swap experiments - but
with the caveats or non-uniform labeling mentioned above.
The major focus of the MAExplorer is interactive data mining with an
emphasis on direct graphical and tabular manipulation of the data.
The investigator is able to interact with the system by clicking on
spots in the array image, points in graphic plots, cells in
spreadsheets, by manipulating threshold sliders or typing in gene
names/clone Ids. This level of interaction allows investigators to
search for and identify patterns of differences with greater ease than
with a more static graphic system since it is easier to test ideas by
"grabbing onto the data". For example, "what" is the identity of
"this" outlier I am pointing to in a scatter plot; "which" genes are
best clustered with "this" gene in this clustergram and are perhaps
co-regulated; "which" genes have expression ratios within the range of
the histogram bins to that I am pointing?
Direct user manipulation of data, as incorporated in MAExplorer, was
defined by (Schneiderman,
1997) who defends the position that the direct manipulation of data
in data mining is an extremely effective means to amplify human
creativity in understanding patterns. Schneiderman's dogma states
"overview first, zoom, and then filter details on demand" and favors
the use of "shallow search trees, slide controllers, and
information-right screens with tightly coordinated panel view of
data", (Beardsly,
1999). MAExplorer also uses many of these direct manipulation
principles. It was designed to run on the desktop computers with data
residing on the same computer and loaded into its memory for rapid
direct manipulation - for both the Web browser and stand-alone
versions.
Part of the Flicker system allows comparison of user 2D gel images
with standard images from SWISS-2DPROT for putative identification of
unknown spots in the user gels. The user would select a standard
2D gel image from over 20 tissue types, enter their own 2D gel image
and align them at spots of interest. They could then switch to a
database access mode, click on those spots and generate popup
SWISS-2DPROT Web pages for those proteins - similar to Clone reports
in MAExplorer. That is accessed at
http://www.lecb.ncifcrf.gov/flicker/swissProtIdFlkPair.html.
MAExplorer will have a groupware facility similar to what we
have done with our
WebGel (http://www.lecb.ncifcrf.gov/webgel/) system described in
(Lemkin et al., 1999b). It is a
two-dimensional electrophoresis system for sharing data analyses. In
WebGel, users may perform a data-mining analysis and leave the state
of the their analysis and accompanying notes to share with their
collaborators on a login-protected basis.
We now discuss using these tools for analyzing ones data.
Table 3.2 Steps in a data-mining analysis.
In designing a data mining experiment, the first decision to be made
is selecting the set of hybridized samples to be compared (steps
1 and 2). This is accomplished by setting the current hybridized
sample-X (HP-X) and hybridized sample-Y (HP-Y). In Figure 2.4.4.2 for the
scatter plot we selected a single C57B6 pregnancy day 13 and a single
Stat5a (-,-) pregnancy day 13 as current HP-X and current HP-Y
samples. Changing the normalization changes the view in the scatter
plot so that hidden differences may be more apparent (see Figure 2.4.2.3)
The names of the current HP-X and HP-Y samples are displayed at the top
of the main window. The current HP-X and HP-Y samples may be changed
at any time by clicking on a new sample from a list of samples shown on
the left side of the main window or from lists of samples organized by
sample population in the Samples menu.
The next decision to be made is selection of the genes to be studied
by choosing a subset from the gene class menu list
(step 4). Further selection occurs throughout the analysis by
clicking on spots in microarray images, points in graphic plots or
cells in spreadsheets, by adjusting threshold sliders, or using the
text-entry "guesser" to type in gene names, clone IDs, genomic IDs,
samples, etc.
The next decision the user must make is to set the intensity data
normalization mode (step 3). Normalization of quantitative data is
crucial when comparing data between different hybridized microarrays
because of spotting, hybridization efficiency, uniformity, and
other systematic errors.
Genes of interest may be separated for all of the genes in the
database using a cascade of data filters (step 4). Additional
filtering options are easily accessible in the (data) Filter menu. Some of the filters
require additional parameters. These parameters are set by state
scroll bars that pop-up on the screen when data filters requiring them
are added to the filter cascade. Changing scroller values causes the
data filter to be automatically be reapplied and a new set of genes
to be computed.
It is desirable to reduce false-positives found by the data filter by
eliminating genes with high quantification variability between
duplicate spots on the same sample or spot duplicated in replicate
samples. If duplicate genes are available on the array (denoted by
Field 1 and Field 2 or F1 and F2 spots), this allows the computation
of a coefficient of variation (CV) for the duplicates. This CV may be
used in a data filter to reduce potential false-positives. CV is
computed as 2|F1-F2|/(F1+F2) using those spot values for each gene,
as StdDevHP/MeanHP for a set of replicate
hybridized samples.
Graphical views of the data give the user additional insights into the
data. These include spot intensity
and ratio or
Zdiff pseudoarray
images, scatter
plots
When there are too many EP-plots to be viewed simultaneously, you
might use a scrollable list of expression profile plots that lets you
scroll through an arbitrarily large list of genes. However, it is
difficult to compare genes that are not sorted in some way
(i.e. clustered). Therefore, these are most useful when used after
clustering the data and displaying the scrollable EP-plots of
the cluster-order data.
Clustering is one way of possibly finding co-expressed genes that
exhibit similar expression changes in a set of samples. Genes may show
similar co-expression, but that does not prove they are co-regulated
at the same point in a pathway - merely that measurements of those
genes in a particular set of experiments show similar
expression. However, identifying genes with similar expression for
which some information is already known about some of the genes may be
useful as a starting point to help figure out gene function and
pathway using additional experiments and analysis.
There are many methods for doing clustering - each with advantages and
disadvantages. We present three methods in MAExplorer and plan on
adding a variety of more powerful methods through the MAEPlugin
facility under development.
The first cluster method finds a cluster of genes whose expression
profiles are similar to that of the currently selected gene. This list
of genes is restricted by the constraint that the cluster distance
between each of these genes to the selected gene is less than the
"Cluster threshold" distance set by the user with a scroll bar. It
displays genes that are found both with blue boxes (the larger the
box, the higher the similarity) and in a text report window showing
the genes and their distances to the current gene. By varying the
threshold and observing the results, the user can find a set of highly
correlated genes. If the threshold is set to 0.0, no genes are
found. If it is set too high, all data filtered genes are found. So
it is critical to adjust the threshold to a reasonable level
commensurate with the type of data being analyzed and the approximate
number of genes expected.
A second cluster method draws blue circles in the array image around
all filtered genes meeting the threshold criteria, where the larger
the circle the larger the number of similar genes (i.e. passing the
threshold) are found to be clustered with that gene. Clicking on a
gene toggles between the first and second methods. For both of these
methods, it will pop-up a "Cluster Distance" threshold scroller and
recomputes the clusters if you change the scroller value or the current
gene. It also shows a text report that displays the number of genes
similar to each data filtered gene.
A third method called "K-means" clustering K genes (we call primary
nodes) whose expression profiles are most orthogonal to each other. It
uses the current gene as the first or "seed" node. It then finds the
gene furthest from this and assigns it as node 2. Then the gene
furthest from both nodes 1 and 2 is assigned to node 3, etc. This
process is repeated until all K nodes are assigned. Then the
remaining genes are assigned to the closest node. Having defined the
initial cluster centers, it recomputes the centroid of each of the
clusters. The centroid can alternatively be computed using a median
instead of a mean in which case we would be doing K-median clustering
(Bickel, 2001). K genes are
then reassigned to the nearest new centroids as the new K-means node
instances. Finally, the remaining genes are assigned to the nearest
centroid. A scrollable K-means cluster text window report pops up
with genes sorted by cluster. Clicking on a gene in either the array
image or scatter plot assigns all genes in the cluster to which that
gene belongs to the "current cluster". Genes in the current cluster
are labeled in the array and scatter plot with a small number of the
cluster. In addition, genes in the current cluster are copied to the
E.G.L. where they can be used in a report, saved in a named gene set,
or used for additional filtering. It also pops up a "N-clusters"
scroll bar window to let you dynamically adjust the number of
clusters. Changing N will recompute the clusters. When the K-means
is recomputed, it uses the current gene as the initial seed gene.
The fourth method is a hierarchical clustering method that generates a
clustergram and dendrogram similar to that of Eisen's red-black-green
clustergram (Eisen, 1998). This
was derived from the clustered correlation map (ClusCor) of Weinstein
et al. (Weinstein, 1997). The
MAExplorer clustergram and dendrogram are dynamic and may be
interrogated and used to set the current gene. This means that it may
also position a corresponding ordered list of expression profile plots
to the same gene so you may view the data as a plot as well. The
dendrogram may be zoomed in to explore a part of the dendrogram in
more detail. As with the K-means clustering, a report can be made of
the ordered genes.
Then, the expression profile is expressed as a list of values:
For scaled data such that dpq has a maximum value of 1.0
ovger all samples. A similarity measure could be computed as
1.0 - distance or
D can get quite large for clustering a large number of genes
N [for N=5000, this is > 50 Mbytes!]
The following is a simplified definition of one way to compute a
hierarchical clustering of gene expression profile data.
If there is only one field in the array, it will appear as field 1. In
the above example, [1-A4,3] is field 1 grid A
row 4 and column 3. Note that the pseudoarray coordinates are for
visualization purposes in MAExplorer and may or may not be the same as
the coordinates on the actual array. That depends on how the
MAExplorer database was defined in the configuration file described in
Appendix C.
When the current gene is defined, it will draw a yellow (green) circle
around the spot in the ratio (intensity) pseudoarray image and display
other features of the gene in the three-line status area near the top
of the main window. If background correction is enabled (the "Use
background intensity correction" in the Normalization menu), then spot
intensity values will appear as intensity' (with background
intensity subtraction) and intensity (without background
subtraction).
There are a number of different reporting formats available depending
on the array display mode and particular normalization method
selected. These include: the pseudoarray image of the intensity of a
single sample, the pseudocolor ratio X/Y or Zdiff (X-Y) image (using
either HP 'sets' or single samples), or the ratio of Cy3/Cy5 for
dual-labeled dyes or F1/F2 for replicate spots for a single sample.
In addition, the normalization mode is also displayed in the reporting
line. We will present examples of each of these different reporting
formats.
You may show the intensity data for a particular spot in the currently
displayed pseudoarray image. First select the "Pseudograyscale image"
option in the "Show Microarray" submenu in the "Plot menu". If your
data has duplicate grids (i.e. fields F1 and F2) then you may look at
F1, F2 and mean (F1+F2)/2 data in the reports when you click on a
spot. If the "Gang F1-F2 scrolling" switch is disabled in the "View
menu", then the intensity value is the intensity data value
for the gene at that location. If the "Gang F1-F2 scrolling" switch is
enabled, then it reports intensity[F1], intensity[F2], and the F1/F2
ratio. These two formats are shown in the following two examples for a
C57B6 pregnancy day 13 samples in the MGAP database:
a) Field F1 spot for a single spot in a single sample with the median
intensity selected.
c) Ratio data for two samples X and Y in separate hybridized
arrays. Ratio data for the field F1 and F2 spot data as well as the
mnX/mnY ratio is reported. The median normalization was used in this
example.
f) Multiple HP-XY 'sets' using median normalization for the pseudoarray
image display for the HP-X 'set' of three C57B6 samples.
j) Multiple HP-XY 'sets' p-value using median normalization for ratio
(HP-X/HP-Y) data for the "Pseudocolor (HP-X,HP-Y) 'sets' p-value
display.
For the intensity and ratio threshold filters, the range
interpretation may be inside, or outside the specified range. The
ratio range [R1:R2] is between 0.01 and 100.0. The Zdiff range [Z1:Z2]
and [CZ1:CZ2] are between -4.0 and +4.0. The intensity threshold range
[I1:I2] is set to the dynamic range of the min and max intensity for
the current normalization method.
A list of possible threshold sliders is shown in the following table.
When a Filter is enabled that requires a slider, it pops up the
State Scrollers window that contains one or more
slides. When you disable all filters that use these sliders, the
popup window will disappear. The corresponding Ratio R1[R2] or
Zdiff Z1[Z2] sliders are used if you are using a ratio or
Zscore normalization - and will change if the normalization changes
while the filter is active.
Some of the sliders are implemented with a non-linear scale so that
you have more resolution at the low end (eg. p-Value, Spot CV, Diff
HP-XY).
Depending on the set of data Filters selected, there may be multiple
sliders present in the State Slider popup window (eg. see Figure
2.4.3).
Table 3.3.1. List of threshold sliders. Sliders are enabled in
the State-Scroller popup window when the corresponding data filters
are enabled.
If you are running on a windowing system supporting cut and paste,
then you may cut and paste data from reports and plots into
applications on your system that allow you to save or print this
data. Set the Report menu table-format to "Tab-delimited". Then, in
Windows 95/98/NT/2000/XP, cut data from the popup tables (or other text
reports) and paste it into Microsoft Excel. In Windows, you can
capture (i.e. "cut") the entire screen by pressing the "Prt Sc" or
print screen button. To capture a specific window (e.g. a scatter
plot), hold the "Alt" key when pressing the "Prt Sc" key. Then go
into a Windows imaging application (such as PhotoShop) and paste it
into the application. In PhotoShop, in the File menu, select New (or
type Control/N). Then when the window is opened, click on the window
and paste the MAExplorer screen you had cut into the image window by
typing Control/V. In both Excel and PhotoShop you may print the data
or save it in a file.
Section 4.1 discusses known bugs, Section 4.2
lists the revision notes for older versions known bugs. If you have experienced bugs
with an older version of MAExplorer, you might check the revision notes
to see if the bug was fixed and download a new version. Section 4.3
discusses problems in using MAExplorer
as an applet with Web browsers. Section 4.4 describes handling fatal "DRYROT" errors.
If you encounter a fatal error that is detected by MAExplorer, it will
popup an error reporting
window. Please E-mail this data to us so we can try to resolve the
problem.
In the mean time, partially implemented commands are disabled to keep
you out of trouble :-) ...
You can help us and get MAExplorer to do more of the things you
would like to see. Let us know of problems that you encounter as well
as suggestions for changes or new methods you would like to see -
send us E-mail.
If you are experiencing Web browser problems using the MAExplorer
applet, you might check the discussion of possible solutions.
Figure 4.4 Example of a fatal Dryrot Error window. This may occur for
a variety of reasons. This window lists the main reason and also lists
some of the MAExplorer state information. If you wish, you may save this
window (press the "SaveAs" button) and mail it to us. We may try to
correct the problem in the next release if it is a problem with MAExplorer.
Alternatively, it could be a user data error.
Figure 4.4.1 Example of a fatal Dryrot Error window after SaveAs.
This tells you where the saved error message file was saved and
the email address to send it to if you wish.
Primary contributers to Cvt2Mae were Peter Lemkin (LECB/NCI), Greg
Thornwall (SAIC/FCRDC), Bob Stephens (ABCC/NIH).
We wish to thank the many members of Lothar Hennighausen's
Laboratory of Genetics and Physiology (NIDDK) who inspired the
initial development of MAExplorer and its continued
development. Thanks also to:
Greg Alvord (SAIC/FCRDC),
Thanks also to Jeff Thomas, Charmaine Richman, and Tom Stackhouse
(NCI) for helping with the MAExplorer Open Source process.
This tutorial lets you
There is also a
pre-computed example of an Ordered Condition List using 4
conditions of replicates of C57B6 (pregnancy day 13, lactation days 1
and 10, and stat5a(-,-) 15 samples. The database also includes 4
additional condition sets of this data and an Ordered Condition List
of the 4 conditions (in the State/ directory). This may be used to
demo the OCL F-test filter.
If you have access to another MAExplorer database, you can use it
instead since the tutorials are fairly generic.
Using the stand-alone application for the tutorial
These same subsets as well as other subsets of the MGAP data are
available in the set of .mae startup files distributed with
MAExplorer. To access these files,
First, select one of the start up databases.
When it starts, a main window will pop up. It then downloads a gene
database tables and the particular hybridized samples you specified.
When it is ready for you to begin interaction, the menu bar will
become active and it will display a green Ready -
click on a gene to query database message. Depending on your
Internet connection speed, it may take a few minutes to set up. If you
are running MAExplorer as a stand-alone application and it is getting
data from your local disk, startup will be much faster.
Second, go to the A.3 instructions for
self-guided tutorial below for instructions on what to do next.
HINT: print this tutorial page and then read
the following instructions from the printout rather than trying to
keep this window visible. You might also print the parts of the
MAExplorer Reference Manual for the same reason.
HINT: You might want to keep a record of the
commands you have used or the messages and measurements you have
made. To do this you need to enable message and command history
logging. Go to the View pull-down menu and then select the type of
logging you want using the
Show log of messages or the Show log of command history
commands.
NOTES:. On computers with low resolution
(i.e. less than 1024 X 780) you may need to resize the windows and
move them to different parts of the screen to view them
simultaneously.
step 1: go to Analysis: Plot: Scatter plots: HP-X vs. HP-Y.
step 1: go to Analysis: Plot: Scatter plots: Cy3 vs. Cy5
If you are working with Cy3/Cy5 dye-swap data, you may swap the
Cy3/Cy5 channel data to Cy5/Cy3 for any selected subset of
samples. This may make it easier to use the data in various ways when
data mining. If you do not have this type of data, go to step 7.
step 5': go to Samples: Edit (Cy5/Cy3) else use (Cy3/Cy5) menu
Note of caution: if the signal is close to background the X/Y ratio
may be bogus.
In the Filter menu, add the "Filter by ratio or Zdiff sliders". Then
the [R1:R2] ratio range sliders are added to the state slider window
and may be used for filtering genes. If the normalization method is
one of the Zscore methods, it filters by the difference of the Zscores
otherwise by the ratio and the [Z1:Z2] range is used. Note that the
genes that pass the filter will appear to have a red (white) circle in
the pseudoarray intensity (ratio) grayscale (pseudocolor), or red "+"
in the scatter plots so you might try moving the controls while in
those plot modes. Try some of the other filters. The spot CV test
removes genes where replicate spot values (F1 and F2 in the case of a
single sample or replicate samples in the case of HP-X and HY-Y 'sets'
or the HP-E' list of genes) are not well correlated. The t-Test filter
may be used with sets of X and Y samples to find genes with a
p-value less than the specified threshold.
Turn on one or more Filters
to reduce the number of genes to say under 100 (e.g. t-test or spot CV
filters). Then press the "Go 'Cluster all genes'" button in the
cluster window. This is equivalent to invoking the "Cluster
counts of Filtered genes by expression profiles" command from the "Cluster plots" submenu.
Notice the Filtered genes has blue circles of different sizes. The
larger the circle, the more genes there are that are similar to that
gene. Move the cluster threshold slider and note that the number of
similar genes changes, the size of the blue circles will change. As
with the other cluster mode, you may generate a report of sorted
cluster counts. Click on a gene with the largest
green circle. This will then switch you back to single gene
clustering mode where you can investigate that gene in more detail.
MAExplorer requires a specification of array geometry and
quantification information. These are defined in a
configuration startup file. The startup file contains the
initial list of hybridized samples to be loaded, and other
parameters such as the name of the configuration file (if it is
different from the default name). A stand-alone application causes the
.mae startup file (or the PARAM list in the case of an applet) to
be read when it is started. The configuration file contains various
defaults. If any of these are specified in the configuration file, the
override the built in default values. Values from the .mae startup or
applet PARAMs will override the configuration file values. These
configuration parameters may be overwritten by arguments in the
stand-alone .mae startup files or PARAMs in the Applet startup
specifications.
A few additional files are required and are defined in the
configuration file. These include: a Gene-In-Plate-Order or
GIPO file; a samples database file listing names of the
samples available for loading; and a gene class names
file. An optional (but deprecated) extra array information file
may be specified to access additional data about samples. Quantified
hybridized sample array spot data (Quant files) from each array
is put into a separate data file. Note that all data files are
tab-delimited files such as may be generated with Excel, relational
databases or directly from array spot quantification software.
Hybridized sample arrays must be scanned and then spots quantified
using other software. MAExplorer does not do spot quantification from
scanned image files. However, MAExplorer can use spot data from a
variety of array image quantification programs that generate
tab-delimited data files. The data needs to be converted to the
MAExplorer schema described in this Appendix.
The derivation of quantified spot data
files from hybridized sample arrays is discussed later in this
section as are in the quant file data
format.
The configuration file is created once for each new array GIPO
geometry and database of hybridized samples. It is independent
of the number of samples. Configuration parameters include array
geometry (# of grids, # of duplicate spots/gene, etc), whether the
data is intensity or ratio data (e.g. Cy3/Cy5), etc. The configuration
file may also include labeling, quantification dynamic range, default
analysis thresholds, mapping of used data file table-field names to
expected MAExplorer names for the GIPO and quantification files,
additional database-specific pull-down menu plugins, names of gene
sets and sample condition lists, etc.
The GIPO file is independent of the number of array samples and
describes the mapping between spot position in an array and its gene
identification as well as corresponding data such as original plate
number, row and column; UniGene ID, GenBank ID, dbEST ID, etc. These
files will be described in more detail including how one can create
the necessary database files that MAExplorer requires for use with
various types of microarray data.
Figure C.1 Directory structure of stand-alone databases required by
MAExplorer. The "/Config", "/Quant", and "/MAE" directories are
required. The /MAE directory is only used with the stand-alone version
with .mae files, not
for the applet. [When used with an applet, the main path is the path
of the download JAR file and .mae files are not used.] The "/Report",
and "/State" directories are created by MAExplorer as needed and the
user need not create them prior to running MAExplorer. The text
reports and plot GIF images are saved in the /Report folder when you
"Save" a report or plot. When you "Save" the current database session
(File | Databases | Save ...), the gene sets and sample lists are
saved in the /State folder for use when you restart MAExplorer on the
.mae startup file. The optional "/Cache" directory is only used (and
then, only optionally) when downloading data from a Web server. The
optional "/Image" directory is only used in there are JPEG images of
the arrays provided and their resolution and alignment must correspond
to the (X,Y) spot data in the Quant files. The "/Plugins" directory is
where the
MAEPlugins packaged with MAExplorer are normally kept and where
MAExplorer looks when you attempt to load a plugin. Since you can
browse your file system, they do not have to appear here.
Sample MGAP database configuration, quantification data and startup
files are available for use as examples with which to make your own
files or for inspection.
Similarly, when the entire database is saved (File | Databases |
SaveAs ...DB) into a .mae startup file, the set of gene set files are
saved as ".cbs" files and the set of condition list files are saved as
".hbl" files in the "State" subdirectory. These are automatically
reloaded into MAExplorer when the .mae startup file is used to restart
MAExplorer.
If your array data has JPEG or similar images of the original arrays,
the should be saved in the "Images" directory. For example, the
NCI-CIT mAdb database server allows you to download sampled images for
your data in an "Images" subdirectory at the same time you download
the other MAExplorer data files. The images can then be used by
various MAEPlugin programs. If your quantified data converted to
.quant files has (X,Y) coordinates corresponding to spots in these
images, then you may be able to use the Montage MAEPlugin to show
where the current spots are in sub-regions of all of the input
images. This plugin will be available on the MAEPlugin Web site when
we release the MAEPlugin facility for Beta-testing.
For a specific database (db), make sure the names of the configuration
files in /Config directory are entered in the
MaExplorerConfig-db.txt file for that database. You may have multiple
databases in the same /Config, /Quant and
/MAE directories if the file names do not conflict. The trick
is to have the .mae startup file in the /MAE
directory point to the specific configFile to be used. Since
MAExplorer reads the MaExplorerConfig-db.txt file when it first starts
up, it discovers the names of the other database files. If there is no
name conflicts, then there is no problem mixing data.
Each spot data (.quant) sample file has a name which must be entered
in the Database_File field of the Samples-db.txt row entry
for a new sample. The Sample_ID field is a descriptive name
of that sample.
Often GIPO files supplied by array vendors have additional fields not
currently used by MAExplorer. You can leave them in (they will be
ignored) or take them out (loading a database is faster).
If the field headings in the various user's tables are not the same
as that required by MAExplorer, you can easily fix this by adding
(Table,Field) mapping entries to your version of the
MaExplorerConfig-db.txt file (see mapTF
for examples).
Note that the optional Menu_Source_Name entry in the
Samples-db.txt file specifies the sub-menu, if any, that the sample
will appear in the Samples menu By Source sub-menu.
If the optional extra sample information file is used, then make sure
the sample names and database file names are the same, and that there
are corresponding rows in each table.
A typical sample database table might look like:
You may optionally include a Database_ID field. For example:
The Database_ID may be useful if there are file length problems on
some systems (i.e. MacOS 8-9), we offer the option of using the
Database_ID as the file name for the .quant (Quant/ directory) and
.jpg (Images/ directory) rather than the Database_File name. For
example one could specify "Quant/270314.quant" and
"/Images/270314.quant" rather than the default "Quant/control1.quant"
and "/Images/control1.quant" names.
The Samples database table includes some required as well as optional
fields (see Table C.2.1.1):
Table C.2.1 List of Samples data file table fields. The Samples table
lists hybridized samples that are accessible to the user and
may be loaded into a database session if they wish.
(See Section C.1.1 for option notation.)
Table C.2.1.1 List of optional Samples data file table fields.
These fields may be used for some additional operations. If they are
not in the Samples DB table, then the operations will not be
available.
(See Section C.1.1 for option notation.)
Table C.2.1.2 List of optional Samples data file table fields.
These fields are not currently used in any computations but are
returned in the Sample Array
report in Section 2.4.6.1.
The basic Quant spot data file table includes entries listed in Table
C.3.1:
Table C.3.1 List of Quant data file table fields. This specifies the
spot quantification data. There may be one or more spots, corresponding
to the same gene, on each row.
(See Section C.1.1 for option notation.)
Note: If NAME_GRC is specified (eg. for use with ImageQuant-NT data),
then the explicit (grid, grow row, grid col) fields are not
required. Note: For [G grids, R rows and C columns], this would cover
a set of spots in the range [1,1,1] through [G,R,C].
Note: If Cy3/Cy5 double fluorescent labeling is used, then the
RawIntensity1 and RawIntensity2 fields may be replaced with Cy3RI and
Cy5RI names and the (RawIntensity1, RawIntensity2) fields mapped to
(Cy3RI, Cy5RI) in the configuration file mapTF entries (table C.5.4 below).
(See Section C.1.1 for option notation.)
Data is extracted from a table created from the gene-in-plate-order
(GIPO) gene coordinate table. This links spots in a microarray to
these Genomic "gene ID"s and gene names. This table may contain
Clone ID, GenBank, dbEST, UniGene IDs, LocusID corresponding to these
Master Gene IDs. An optional table of Clone IDs and Gene Classes the gene
belongs to may also be defined.
A typical GIPO database table might look like:
The basic GIPO table includes the following fields:
For example, if grid #, row# and column# are (8,12,11), then it codes it as
We are developing tools for creating and editing the configuration
file. In the mean time, edit the file with Excel and save the finished
table as a tab-delimited text file with the name
MaExplorerConfig.txt in the Config sub-directory) in
the directory where your database is stored.
Table C.5 List of Configuration data file table fields.
The following sub-tables list the configuration parameters and
some typical values that might be included. These examples illustrate
the variety of parameter options with examples of values that
might be used. Required entries are listed at the tops of the tables.
A typical MAExplorer minimal configuration database table might look
like:
The following table fields may be mapped. Note: mapping is
required only when the table field names of your data files are
different than the internal MAExplorer table field names.
The following is an example of some of the parameters that might be
added to the Configuration file to perform field name mappings. Note:
these mappings are only required if the data field names are non-standard.
This shows some typical field name mappings. It will not be the same
for your data.
(See Section C.1.1 for option notation.)
Therefore we have created a
Cvt2Mae is a "Wizard" driven process designed for use by molecular
biologists. It handles commercial chips such as Incyte, Affymetrix,
GenePix, Scanalyze, etc. or one-of-a-kind academic chips. It asks you
questions to describe your chip and your data. We call the chip
description the "Array Layout". After you have created or edited an
array layout, you may save it for use in future conversions. [The
array layouts are kept in a subdirectory "ArrayLayout" in the
directory where you installed Cvt2Mae.] Since an ArrayLayout is a
file, you could mail it to a collaborator. After you have answered the
questions, you then run the converter and it generates the proper set
of converted data files. In the case of user defined array layouts,
we denote the latter as <User-defined> where the user assigns a
name to that layout as part of the description. Essentially, the array
layout contains a set of "rules" for describing the user's array data
so Cvt2Mae knows how to read it. At some point, we plan to add the
MAGE-ML standard to Cvt2Mae as one of the array layouts so it should
be able to handle a wider variety of data.
Figure C.6.1 The Cvt2Mae array data converter.
The details on Cvt2Mae including more description, PDF examples of conversions
for several different types of arrays, the download area, status of the converter,
etc. are available on the
This section discusses the installation of MAExplorer as a stand-alone
application on a variety of computers. Since Java is portable between
Microsoft Windows (95/98/NT/2000/XP), Macintoshes, Linux, Solaris,
etc., it is possible to freely download and install MAExplorer and
Cvt2Mae on your computer and run it as an application program.
There is a discussion on using it with other arrays (Appendix C) that requires
editing data files for use with MAExplorer. An array data conversion
tool is being constructed which will automate this process in the
future.
Figure D.1 Web page showing options for installing MAExplorer as a
stand-alone application. Installers are available for
Windows95/98/NT/2000/XP, Mac OS, Solaris, Linux, Unix, and other Java
enabled platforms.
You first need to download MAExplorer for your particular type of
operating system. These include Windows 95/98/NT/2000/XP, MacOS for
Power PC, Sun Solaris, HP-UX, other Unix versions (e.g. Linux,
etc). The Windows, MacOS and Solaris versions include a Java run-time
(Java Virtual Machine) that works with MAExplorer. We recommend you
download the full distribution for your computer (which includes a
recent Java virtual machine (JVM) if it exists). This insures proper
operation of MAExplorer and does not interfere with other Java
applications you might have installed or will install.
This installation process uses a commercial "Java Installer" (InstallAnywhere(TM) by
ZeroG Inc.) that requests you "Grant" it permission to save the
installation on your computer. It will suggest where to install it or
you can install it wherever you want. For example, in Windows it may
suggest saving it in C:\Program Files\MAExplorer\ - you can
specify an alternative directory if the default disk does not have
much free space left.
You can let your Unix system find MAExplorer by putting it in your
path variable in your login or shell startup script.
Then, you would start it by specifying the startup file residing
in the MAE/ subdirectory as:
There is a set of sample MGAP .mae files in the MAE
subdirectory in the downloaded installation.
The .mae startup files are simply tab-delimited ASCII files
with a .mae file extension. These could be created or edited either
manually (e.g. using Microsoft Excel and saving the file as a
tab-delimited file) or by various database programs (eg. the NCI/CIT
mAdb program, the MAExplorer Cvt2Mae program being developed). They
may also be generated by MAExplorer (File:Database:Save as file DB).
The .mae file form consists of two tab-delimited columns containing
fields Name and Value. These field names appear in the
first row. This is followed by instances of the various parameters.
A simple .mae file is shown in the following table Table D.4 using a 4 sample
Lactation database subset from the MGAP database.
Although any of the configuration file values can be specified in the
.mae file, we list some of the more common optional parameters are
indicated in Table
D.4.1.
Table D.4.1 Some of the common optional entries for .mae startup
files. These entries are shown with example values for the 4
samples in Table D.4. See (Appendix C.5 for lists of many
other options.
The following is a simple example of HTML code containing an applet
which will invoke MAExplorer. You may add other options with PARAMs
in the Applet (or for that matter in the the .mae startup file) that
overide any options normally specifed in the Configuration file (See
Appendix C.5).
Another major decision was to use multiple pop-up windows for 2D
plots, histograms, expression profiles, clustergrams, reports, dialog
boxes, etc. rather than sharing a single window. These windows are
maintained by a special pop-up registry that handles many of the
bookkeeping chores involved with tracking and updating multiple
windows viewing the same underlying data. Whenever an event occurs
which may change the set of data filtered genes, the current gene or
the current cluster set of genes, the registry is notified. Some of
the events are the current clone changed, the Filter parameters
changed, the sample labels changed, the normalization method changed,
etc. It in turn notifies all relevant active plots, tables and
reports - requesting them to update themselves if necessary. This
object-oriented design greatly simplifies the process of synchronizing
the various data presentations with changes in the database.
A good intersection of the server-centric and client-centric methods
is to distribute the computation and data to the systems where they
can be handled most effectively. Because Java enables computation in
a Web browser, PCs currently available have enormous power and memory,
and high-speed Internet connections are readily available, it is now
possible to distribute some of the data and computations to the
desktop. If high-speed direct manipulation methodology is to be made
available on the Internet for microarray data mining, then it must be
brought to the user's desktop browser or local computer rather than
residing solely on the back-end server. This is the approach taken in
designing the MAExplorer.
Table E.2 Comparison of client-centric vs. server-centric data mining.
The table shows a comparison of some of the features of client-centric
and server-centric (using CGI and/or Applet) data mining analysis
methods. The client-centric approach presented here primarily uses
Java with data downloaded to the client's computer. A server-centric
approach might use a mix of HTML, CGI, servlet and Java. However, even
a client-centric approach may take advantage of server support for
additional functionality (e.g. accessing genomic servers to gain
additional information about specific genes or sets of genes).
The following figures show the top level plugin design.
Figure E.4.1 Overall MAEPlugin design for MAExplorer. Plugins
are dynamically loaded into MAExplorer where they may be invoked from
a menu entry or by various other means such as startup, normalization,
etc.
Figure E.4.2 Open Java API for MAEPlugins. Each type of application
could be derived from specialized Java classes that contain most
of the access methods required for that type of analysis.
In support of the MGAP server, additional software was written to
automate the pre-processing of the microarray quantitative data from
Research Genetics' Pathways array quantification analysis program and
perform compression and Web server updates for this data. The Web
server also hosts several common gateway interface (CGI)
programs. These include user login support, a Web proxy server (to
access other genomic Web sites from the Java applet), support of
login-protected user state file access, custom database creation, user
state files, and "groupware" user-access support.
This data may be used for learning about MAExplorer with the tutorials
and for investigating some of the stages of normal and mouse-model
mammary development. The MAExplorer reference manual may be viewed in your browser
from the Web from this Web site. Alternatively, you may download the
full manual as a Acrobat
MaeRefMan.pdf PDF file (> 5Mb).
If you have problems with the installation, then you might want to
read the rest of this section and also the part of the manual which
discusses installation (Appendix
D) and using it with your arrays (Appendix C). The latter
requires editing your data files for use with MAExplorer. The Cvt2mae is a "wizard" array
data conversion tool automates this process.
If you have previously installed MAExplorer and you want to update
just the MAExplorer.jar
file (the actual program), you can do this as described in Section 1.3. Alternatively, you can use
the new "Update MAExplorer" command in the Files menu. This will (1)
backup the current MAExplorer.jar file as MAExplorer.jar.bkup; (2)
copy the latest MAExplorer.jar file from the
maexplorer.sourceforge.net Web site and replace your MAExplorer.jar
file in your installation directory. Then when you restart
MAExplorer, it will use the new version of the program.
After initially, installing MAExplorer (or the Cvt2Mae for that
matter), you can simply download the
latest .jar file and overwrite the previous version you had when
you installed the program. The MGAP demo data can be downloaded separately.
Figure. Web page showing options for installing MAExplorer as a
stand-alone application. Installers are available for
Windows95/98/NT/2000/XP, MacOS-8/9, MacOS-X, Solaris, HP-UX, Linux, Unix,
and other Java enabled platforms. [Click on the figure to see a high
resolution version.] NOTE: the MacOS installer is
currently not available. If you have problems with the Sun installer,
you may need to update your Solaris OS system patches (see below).
2. You start the download process when you click on the installer for
your computer platform. (You may alternatively use the default installer discussed below.) Follow
the directions it provides as you download the installer. It also
provides instructions in the "View" hyperlink adjacent to the
operating system you selected that tells you what to do after you
finished the download. Part of the installation consists of telling
the installer where you want to 1) put the executable installer (a
temporary directory where you have lots of room is a good choice), and
2) the "installation" directory where you will typically leave the
distribution after the installer unpacks it.
We use the commercial
InstallAnywhere(TM) program to create the installers.
It provides installers for:
The default installer will put the installer executable in a fixed
directory and the installed MAExplorer files in another fixed
directory.
Note that the installers (where possible) will include a copy of a
recent Java Virtual Machine (JVM) from InstallAnywhere(TM)
to make running MAExplorer on your computer more robust. This is used
locally and only affects the running of MAExplorer. It will
not affect any other Java applications on your computer. In the
case of Mac OS, if you have an older version of the MRJ JVM, it will
ask you if you want to upgrade to the newer version (MRJ-2.4.5) -
however you do not have to unless you want to.
The MAExplorer
Reference Manual describes the details of MAExplorer as well as
showing a number of screens illustrating various data-mining
operations. Several tutorials are available and are discussed in the
Reference Manual.
If you are on a Macintosh system, then start MAExplorer and then run
the startup .mae file you want by going to the File menu and then the
Databases submenu. Use the "Open disk DB" option to browse your disk and
then open up the startup file of interest.
If you are on a Unix system, then you supply the MAE file explicitly
in the command line. You might consider adding the "installation"
directory to your UNIX $PATH or $path variable to
have UNIX automatically find the executable binary.
A: For Mac-X, with 256 character file names, this is not a
problem. For MacOS 8 and 9 with 32 character file names it may be a
problem. Because MAExplorer uses file extensions (eg. ".quant"), you
are currently limited to 25 characters or less. We will be modifying
the system to remove this limit.
Q: I tried unsuccessfully to open NCI/CIT mAdb data (nciarray.nih.gov)
on a Mac OS system. I generated a .zip file using mAdb "BETA Formatted
Array Data Retrieval Tool" , then decompressed this .zip file using
"Stuffit Expander" on my Mac. The Start.mae file could not be opened
by MAExplorer, what can I do to fix this?
A: Stuffit Expander (default settings) removes a form feed character from
decompressed text files, this prevents the Start.mae (and other text files
used by MAExplorer) to be read by MAExplorer. To fix this you need to set
Stuffit Expander so that it will keep the form feed characters when it
decompress text files:
Q: How do I start MAExplorer on my data automatically by
double-clicking a Start.mae file on my Mac.
A: There is no easy way to do this at this time. Use the File menu,
Databases, Open Disk DB browser to specify the Start.mae file.
Default Lax settings
MAExplorer has a Java plugin extension facility. Plugins written for
MAExplorer are called "MAEPlugins". These MAEPlugins allow
investigators to extend the core capabilities of MAExplorer program
themselves by writing special programs to implement new analysis
methods and access data from their MAExplorer database(s). The design of this plugin extension
enables users to write these new methods and have them added to the
MAExplorer menus or for plugins to be invoked when MAExplorer starts
up. In addition, default MAExplorer functionality could be changed by
replacing existing MAExplorer methods with user defined
methods. Writing a plugin to extend functionality using our Open Java API (Application
Programming Interface) than to understand and modify the full
MAExplorer program. This section of the Web site describes the API,
describes how to write a MAEPlugin, and gives examples of various
plugins. All source code is available on our
CVS Repository.
Keep checking this Web
page for the current status of the API as well as the MAExplorer Revision Notes
which gives a history of new features and changes to both MAExplorer
and the API.
MAExplorer is open source with a Mozilla 1.1 general
public license. However, we have made the MAEPlugins public
domain (a secondary license that is even less restrictive) with no
restrictions on their use. This enables the research community to
modify and help improve MAExplorer and the MAEPlugins as required. We
are dividing the plugins into those that are donated and those that
require interaction with the supplier. We hope that most plugin
developers will make them available as open source, but that is not a
requirement. If you are interested in writing a plugin or working with
us on this open source project please contact
us.
As we noted, the Open Java API is included in the regular .jar file
distributed when you download MAExplorer. The current MAExplorer jar
file may be downloaded from
MAExplorer.jar. You also will also automatically download the jar
file when installing
MAExplorer. If you have MAExplorer installed, then you can use
the (File menu | Update MAExplorer from maexplorer.sourceforge.net)
command when running MAExplorer to get the latest MAExplorer.jar file
release.
This document discusses the paradigm how MAEPlugins are used with
MAExplorer and the design used to give them access to MAExplorer
data. The first part discuss the top level
design and the second part gives an
example of using a plugin. The details on the internals for
MAExplorer itself are described in a Design doc (PDF) or (PPT). However, an
understanding of the MAExplorer internals is not required to write a
MAEPlugin.
Figure 1. Overall MAEPlugin design for MAExplorer.
Plugins are dynamically loaded into MAExplorer where they may be
invoked from a menu entry or by various other means such as startup,
normalization, data filtering, etc. Any number of plugins may be
loaded simultaneously. They may be loaded and unloaded dynamically,
and saved for automatic loading when the current database is saved.
Figure 2. Open Java API for MAEPlugins. Each type of
application could be derived from specialized Java classes that
contain most of the access methods required for that type of analysis.
The Gather - Scatter API is a means of "gathering" data from MAExplorer
internal data structures for the plugin. When a plugin wants to store data
back into MAExplorer, it is "scattered" back into the internal data structures.
This is implemented using the MaeJavaAPI and MJAxxx classes described in the
Open Java API.
Figure 3. Loading a MAEPlugin from your file system using
the Load Plugins command in the Plugins pull down menu. If you
have a plugin .jar or .class file, it may be specified using the "Load
plugin" command. This pops up a file browser to let you specify the
plugin file.
Figure 4. Executing the new command previously loaded in the
Plugin menu. Selecting the new "Show List Active Filters" command
that now appears in the Plugins menu invokes the plugin. This pops up
a report shown in the next figure.
Figure D.5. Popup window from executing the MAEPlugin.
This plugin gives a full report on the data Filter status in a new
pop up window.
This document describes the MAEPlugins Open Java API
(Applications Programming Interface) to enable researchers to write
their own MAEPlugins for use with MAExplorer. The Open Java API (or
API) is presented here as two javadoc trees.
The Open Java API is automatically included in the
MAExplorer.jar file. Although it wastes some space, we are
exporting the symbol tables with the files in MAExplorer.jar so that
you could use it with
a debugger (such as
Forte for Java Community Edition) to develop a
MAEPlugin
This document briefly describes how to write MAEPlugins using the MJA
Open Java API (Application Programming Interface). It discusses key
issues to be addressed when writing a MAEPlugin and describes in
sufficient detail to enable researchers to write their own MAEPlugins
for use with MAExplorer. Note that there are basically two types of
plugins: those which are one-shot plugins (e.g., popup a window with
its own user interface or perform an operation one time), and pipeline
operations. The latter include FilterPlugins and
NormalizationPlugins. These are inserted by MAExplorer into the gene
filtering chained intersection analysis and the normalization
analysis. See examples of existing plugins to help understand the
differences.
This document gives a simple tutorial example of MAEPlugins source
code. After you have read this you might look at some of the source code from actual
plugins. Note that there are several base class plugins
(PopupPlugin, FilterPlugin, NormalizationPlugin, etc.) that require
different overide methods or have abstract methods you must
implement. Look at the examples to clarify this.
The XxxxxPlugin() method is called at the time the plugin is loaded.
Any particular actions that may be required can be performed at that time.
In this example, we merely set the name of the plugin as it is to appear
in the Plugins pull-down menu.
The pluginMain() method is called at the time the plugin is invoked by
selecting the menu entry.
The four special event handling methods updateCurGene(),
updateFilter(), updateSlider(), and updateLabels() are invoked by the
MAExplorer PopupRegistry when any of these events occurs. If you are
doing nothing with the events, they may be no-ops. However, if you
want to take action on these events, you would normally implement the
actual event handling code in your Xxxxx.java class.
This document lists all of the MAEPlugins alphabetically, by analysis
method, and also links to MAEPlugins available on other Web sites. The
MAEPlugins include those donated to the MAExplorer Open source Web
site. All plugins distributed from this Web site will have the Java
source code, JAR file and documentation. Some of these MAEPlugins were
incorporated into MAExplorer after they were written because of their
key functionality. However, we are leaving them on the Web site
to serve as examples.
If you want to use the jar file plugins directly: (1) install
MAExplorer from the list of Jar files on this Web
site, (2) get the jar file(s) from the plugins below and save them in
the Plugins/ directory where you installed MAExplorer, (3) run
MAExplorer and use the (Plugins | Load Plugin) menu command to load
the plugin. After it is loaded, just use it as you would any other
menu command. The Plugins-jar.tar
file is available with all of the MAEPlugin jar files. Simply unpack
the directory using Unix tar or a Windows unzip program into a
directory you can access when running MAExplorer. To let MAExplorer go
directly to these files when you do a (Plugins | Load plugins) menu
command, copy the .jar files into the Plugins/ directory where you
previously installed MAExplorer. We also periodically update the
MAEPlugins-....-src.tar.gz file in the Files
download area. Files from the following list of MAEPlugins are
archived as follows: source files are from the CVS
archive, jar files are from the Web server archive
of plugin .jar files, documentation is also from the CVS
archive.
If you want to use these plugins as a basis for developing your own
plugins, see developing a
plugin and other resources available on this Web site. The source
code for each plugin is available beow. We encourage, but don't
require, plugin writers to donate their new plugin analytic methods to
the MAExplorer Open Source Web site for others to use.
Cvt2Mae is a Java program designed to make it easier for use by
researchers to use MAExplorer by helping them convert their data into
the MAExplorer format. Cvt2Mae handles commercial chips such as
Affymetrix, as well as other standard formats such as GenePix and
Scanalyze or one-of-a-kind custom academic chips
(<User-defined>). In addition, you may specify the fields of
interest for the "Print file" or (GIPO or Gene-In-Plate-Order) file,
and the fields containing the quantified data.
The Cvt2Mae converts specific chip information you entered into what
we call an "Array Layout". This Array Layout file may be edited and
saved for use in future conversions and shared with collaborators.
Essentially, the Array Layout contains a set of "rules" for converting
the user's data. After you have filled out the forms in Cvt2Mae, it
will generate the set of converted data files and directories to be
used directly with MAExplorer.
There are several slide shows describing how to use the Cvt2Mae to
convert various data sets. They consist of a series of screen shots
from Cvt2Mae that go through each of the steps on how to set up the
parameters and convert your data. There are two for Affymetrix data, one
is a downloadable PDF and other an extensive online version.
Instructions on downloading and installing Cvt2Mae.
If you then are still having problems email the help
desk. Please include:
The Edit Layout wizard also has its own information area that is used
for reporting. When you hold the mouse over the a field on the left
side of the wizard window, information about that parameter will
appear in the lower message area.
Figure 1. shows the Affymetrix tab-delimited data in Excel.
(after missing fields have been edited as described above).
Figure 2. Initial state of the Cvt2Mae Program. The user
must select an array layout or define one in order to analyze the
input data file or files.
Figure 3. Selecting a Chipset Array Layout. The built-in
array layouts are shown for the Incyte and Affymetrix. User-defined
layouts would be added by selecting the <User-defined> layout.
Figure 4. Select one or more user input data files by pressing the
"Browse input file name" button and then pick a file. If the
layout indicates that it may contain more than one hybridization, it
will attempt to find the data. You can subsequently rename individual
samples which may be necessary if you are reading several files with
the same sub-sample names. After the file browser pops up, select a
user input data file. If you are using a file that contains all of
your samples, then you only need to specify one file. If you have
several files, then repeat this step until you have added all of the
files you want.
Figure 5. Files selected by user and samples "discovered" in the
data file. Each input file is analyzed to determin if it has]
multiple samples and if so they are added to the list of input files
at below step 2.1 in the window. You may remove any samples which may
be necessary for bad data. You may rename any sample which may be
necessary if you have the same sample name occuring
in several different data files (they are actually different samples).
Figure 6. Edit Layout Wizard for name of the Array Layout.
A) is the original array layout frome the database. B)
Since we may want to edit it, we will rename the vendor and Array
layout name. This will enable us to save the changed layout if we
wish. You may not overide system defined layouts, but you may overide
your own layouts or save a system layout under a new name (as is shown
here).
Figure 7. Edit Layout Wizard for Grid Geometry.
Figure 8. Edit Layout Wizard for Starting Data Rows.
Figure 9. Edit Layout Wizard for Ratio or Intensity data.
Figure 10. Edit Layout Wizard for optional (X,Y) spot coordinates
available in the input data.
Figure 11. Edit Layout Wizard for optional Genomic ID values
available in the input data.
Figure 12. Edit Layout Wizard for optional Gene Names available
in the data.
Figure 13. Edit Layout Wizard for optional calibration DNA available
in the data and UniGene species prefix.
Figure 14. Edit Layout Wizard for optional user names for Project,
Database, Subdatabase, etc.
Figure 15. Edit Layout Wizard for optional HP-X and HP-Y 'set' experimental
class (i.e. condition) names.
Figure 16. Edit Layout Wizard for changing the default data filter threshold
slider values.
Figure 17. Edit Layout Wizard for Assign GIPO fields. These Gene-In-Plate-Order data field
mappings should only be changed if required for additional data
fields you may have added to your input file. All fields should be
defined. (it is required for <User-defined> data). In general, it
may be ok to have some non-critical genomic ID fields undefined.
Figure 18. Edit Layout Wizard for Assign Quant fields. These
Quantification data field
mappings should only be changed if required to define all fields
(it is required for <User-defined> data).
Figure 19. Saving modified Array Layout if you have made
changes. This is useful if you have changed the array layout with
"Edit Layout", "Assign GIPO fields", or "Assign Quant fields" so that
you can use it another time.
Figure 20. Selecting the output folder in which to save the
converted files. The Magenta "Save Layout" button means that you
may save the edited array layout if you wish. You now need to create
an output folder to put the converted data. You may create a New
Folder, use an Existing Folder or use the Same Folder that contained
the input files. We selected the "New Folder" option.
Figure 21. Browse to select the output folder in which to save the
converted files. You may create a new folder here.
Select the "name" of the folder - don't go into the folder.
Figure 22. shows the interface after selection of the output file
folder using a file browser. Notice that the current project directory
is now displayed in the interface as well as the location of the MAExplorer
Start.mae file that will be generated. The data will be created when
the Run button is pressed.
Figure 23. shows the conversion being performed after the user
pressed the RUN button. This process takes a minute or so
depending on the speed of the computer and the complexity of the data.
Figure 24. shows the conversion summary instructions after the
conversion is finished. At this point press the DONE button to
exit the converter.
Figure 25. shows the files that are generated by Cvt2Mae for use
by MAExplorer. The generated data consists of several directories
that are described in the Reference Manual Appendix C.
Figure 26. Starting MAExplorer on the converted data by clicking
on Start.mae file. Alternatively, Note that the location of the
"MAExplorer startup file:" is specified. Go to that file and click on
it to start MAExplorer. Alternatively, start MAExplorer and do "File |
Open Disk DB" and open that file to start it.
Table 1. below lists the various
types of downloads: program installers, source code files, jar files,
and information on installing the programs. The Java API documentation is
also available. Table 2. lists various ways
to download the Mammary Genome Anatomy Program (MGAP) public data
set
Click on the entries to download the files.
Similarly, in the Cvt2Mae program, pressing the "Update Cvt2Mae"
button will repeat the same process except that it does it for the
Cvt2Mae.jar file and creates a backup file called Cvt2Mae.jar.bkup.
However, you can generate your own javadocs for the code using the
Unix script
CreateMAExplorerJavaDocs.do for MAExplorer and
CreateCvt2MaeJavaDoc.do .
1. Definitions.
1.1. ''Contributor'' means each entity that creates or contributes
to the creation of Modifications.
1.2. ''Contributor Version'' means the combination of the Original
Code, prior Modifications used by a Contributor, and the Modifications
made by that particular Contributor.
1.3. ''Covered Code'' means the Original Code or Modifications
or the combination of the Original Code and Modifications, in each case
including portions thereof.
1.4. ''Electronic Distribution Mechanism'' means a mechanism
generally accepted in the software development community for the electronic
transfer of data.
1.5. ''Executable'' means Covered Code in any form other than
Source Code.
1.6. ''Initial Developer'' means the individual or entity identified
as the Initial Developer in the Source Code notice required by Exhibit
A.
1.7. ''Larger Work'' means a work which combines Covered Code
or portions thereof with code not governed by the terms of this License.
1.8. ''License'' means this document.
1.8.1. "Licensable" means having the right to grant, to the maximum
extent possible, whether at the time of the initial grant or subsequently
acquired, any and all of the rights conveyed herein.
1.9. ''Modifications'' means any addition to or deletion from
the substance or structure of either the Original Code or any previous
Modifications. When Covered Code is released as a series of files, a Modification
is:
B. Any new file that contains any part of the Original Code or
previous Modifications.
1.10.1. "Patent Claims" means any patent claim(s), now owned
or hereafter acquired, including without limitation, method, process,
and apparatus claims, in any patent Licensable by grantor.
1.11. ''Source Code'' means the preferred form of the Covered
Code for making modifications to it, including all modules it contains,
plus any associated interface definition files, scripts used to control
compilation and installation of an Executable, or source code differential
comparisons against either the Original Code or another well known, available
Covered Code of the Contributor's choice. The Source Code can be in a compressed
or archival form, provided the appropriate decompression or de-archiving
software is widely available for no charge.
1.12. "You'' (or "Your") means an individual or a legal
entity exercising rights under, and complying with all of the terms of,
this License or a future version of this License issued under Section 6.1.
For legal entities, "You'' includes any entity which controls, is controlled
by, or is under common control with You. For purposes of this definition,
"control'' means (a) the power, direct or indirect, to cause the direction
or management of such entity, whether by contract or otherwise, or (b)
ownership of more than fifty percent (50%) of the outstanding shares or
beneficial ownership of such entity. (b) under Patents Claims infringed by the making, using or selling
of Original Code, to make, have made, use, practice, sell, and offer for
sale, and/or otherwise dispose of the Original Code (or portions thereof).
(d) Notwithstanding Section 2.1(b) above, no patent license is
granted: 1) for code that You delete from the Original Code; 2) separate
from the Original Code; or 3) for infringements caused by: i) the
modification of the Original Code or ii) the combination of the Original
Code with other software or devices.
(b) under Patent Claims infringed by the making, using, or selling
of Modifications made by that Contributor either alone and/or in
combination with its Contributor Version (or portions of such combination),
to make, use, sell, offer for sale, have made, and/or otherwise dispose
of: 1) Modifications made by that Contributor (or portions thereof); and
2) the combination of Modifications made by that Contributor with
its Contributor Version (or portions of such combination).
(c) the licenses granted in Sections 2.2(a) and 2.2(b) are effective
on the date Contributor first makes Commercial Use of the Covered Code.
(d) Notwithstanding Section 2.2(b) above, no
patent license is granted: 1) for any code that Contributor has deleted
from the Contributor Version; 2) separate from the Contributor Version;
3) for infringements caused by: i) third party modifications of Contributor
Version or ii) the combination of Modifications made by that Contributor
with other software (except as part of the Contributor Version) or
other devices; or 4) under Patent Claims infringed by Covered Code in the
absence of Modifications made by that Contributor. 3.2. Availability of Source Code.
3.3. Description of Modifications.
3.4. Intellectual Property Matters
(b) Contributor APIs.
3.6. Distribution of Executable Versions.
3.7. Larger Works.
6.2. Effect of New Versions.
6.3. Derivative Works.
8.2. If You initiate litigation by asserting a patent infringement
claim (excluding declatory judgment actions) against Initial Developer
or a Contributor (the Initial Developer or Contributor against whom You
file such action is referred to as "Participant") alleging that:
(a) such Participant's Contributor Version directly or
indirectly infringes any patent, then any and all rights granted by such
Participant to You under Sections 2.1 and/or 2.2 of this License shall,
upon 60 days notice from Participant terminate prospectively, unless if
within 60 days after receipt of notice You either: (i) agree in writing
to pay Participant a mutually agreeable reasonable royalty for Your past
and future use of Modifications made by such Participant, or (ii) withdraw
Your litigation claim with respect to the Contributor Version against such
Participant. If within 60 days of notice, a reasonable royalty and
payment arrangement are not mutually agreed upon in writing by the parties
or the litigation claim is not withdrawn, the rights granted by Participant
to You under Sections 2.1 and/or 2.2 automatically terminate at the expiration
of the 60 day notice period specified above.
(b) any software, hardware, or device, other than such
Participant's Contributor Version, directly or indirectly infringes any
patent, then any rights granted to You by such Participant under Sections
2.1(b) and 2.2(b) are revoked effective as of the date You first made,
used, sold, distributed, or had made, Modifications made by that Participant.
8.3. If You assert a patent infringement claim against
Participant alleging that such Participant's Contributor Version directly
or indirectly infringes any patent where such claim is resolved (such as
by license or settlement) prior to the initiation of patent infringement
litigation, then the reasonable value of the licenses granted by such Participant
under Sections 2.1 or 2.2 shall be taken into account in determining the
amount or value of any payment or license.
8.4. In the event of termination under Sections 8.1 or
8.2 above, all end user license agreements (excluding distributors
and resellers) which have been validly granted by You or any distributor
hereunder prior to termination shall survive termination. Software distributed under the License is distributed on an "AS IS"
basis, WITHOUT WARRANTY OF
The Original Code is ______________________________________.
The Initial Developer of the Original Code is ________________________.
Portions created by
Contributor(s): ______________________________________.
Alternatively, the contents of this file may be used under the terms
of the _____ license (the “[___] License”), in which case the provisions
of [______] License are applicable instead of those above.
If you wish to allow use of your version of this file only under the terms
of the [____] License and not to allow others to use your version of this
file under the MPL, indicate your decision by deleting the provisions
above and replace them with the notice and other provisions required
by the [___] License. If you do not delete the provisions above,
a recipient may use your version of this file under either the MPL or the
[___] License."
[NOTE: The text of this Exhibit A may differ slightly from the text
of the notices in the Source Code files of the Original Code. You should
use the text of this Exhibit A rather than the text found in the Original
Code Source Code for Your Modifications.]
The U.S. Government LEGAL
notice accompanies the MPL 1.1 document.
This document comprises the LEGAL File pursuant to Articles 3.4 and 4
of the Mozilla Public License (version
1.1) stating the intellectual property and other limitations
associated with the use of MAExplorer under this License. Dr. Peter
Lemkin as an employee of The National Cancer Institute (NCI), an
agency of the United States Government, is the Initial Developer of
MAExplorer (the Original Code). As such, the following limitations
apply to this License:
Types of pseudoarray image displays
There are several differnt types of pseudoarray images that may be
displayed. The current type is set in the Show Microarray
submenu in the Plot menu selections including
Pseudograyscale intensity that approximates the intensity of a
single sample or average of samples. The Pseudocolor
Red(X)-Yellow-Green(Y) HP-X/HP-Y ratio or Zdiff and Pseudocolor
Red(Cy5)-Yellow-Green(Cy3) Cy3/Cy5 (or F1/F2) ratio or Zdiff add
the two samples or channels together as separate Red+Green channels to
give a color spectrum. The Pseudocolor HP-X/HP-Y ratio or Zdiff
Pseudocolor Cy3/Cy5 (or F1/F2) ratio or Zdiff gives a color
spectrum from a low ratio (zdiff) value (Green) to a high value (Red)
with a value of 1.0 (0.0) of Black. The Pseudocolor (HP-X,HP-Y)
'sets' p-value shows the p-Value between two X and Y sets in a
color spectrum.. If the Original image is set and the image
file is in the database, it will pop up a separate Web browser window
to display it. The Pseudograyscale display is a grayscale image, with
higher concentration genes appearing darker, on a light blue
background. The pseudocolor HP-X/HP-Y ratio of spots image is
constructed using a color scale going from bright green (<1) to
black (=0) to bright red (>1) on a black background. For the
pseudocolor Zdiff of (X-Y), the color scale goes from bright green
(<0) to black (=0) to bright red (>0). If the dichromasy
switch is set in the View menu, that a different set of colors is
selected that may be easier for some people to differentiate. If the
Use dual HP-X & HP-Y 'sets' else single samples toggle in
the Samples menu is set, it displays the mean HP-X data in the left
and HP-Y in the right for doing a side by side comparison. Popup windows
MAExplorer starts with the main pseudoarray image windows. This window
contains the pull-down menus where you may issue commands. As you
perform various operations, new windows may popup for some of these
commands. For most of these windows, you may click on the "Close"
button or click on the close window icon associated with your
operating system (generally one of the buttons at the top of the popup
window). However, some windows were designed to not close when you do
this. In particular the "State sliders" are not able to be
closed unless the associated data filtering or clustering operation is
closed. When you close the associated operation will automatically
close the state slider window.The current sample sample, HP-X, and HP-Y
In MAExplorer, a hybridized array sample is abbreviated HP. The
underlying data comparison model assumes, as a minimum, the comparison
of two different experimental conditions represented by samples HP-X
and HP-Y. A good way to think about this is that these variables are
the two axes of a scatter plot (one of the displays you may
generate). The HP-X and HP-Y may be thought of as containing data from
either single hybridized samples or containing mean data from multiple
replicate sets of sample. The HP-X and HP-Y are assigned using the
Set current HP-X and Set current HP-Y in the
Samples menu (hybridized sample is abbreviated HP in
MAExplorer. The sets are most easily changed using Choose HP-X,
HP-Y and HP-E to select the currently active samples. The
contents of the of multiple sample HP-X and HP-Y 'sets' may
alternatively be changed using the Edit HP-X & HP-Y 'sets' of
samples by source submenu, and the HP-E list of samples using the
Edit HP-E list of samples by source. Assigning single samples
to either HP-X or HP-Y may be done from the Samples menu. However, it
is easier to do it by clicking on the pseudoarray image. First click
on the magenta "[X]" or "[Y]" Current Sample box at the top of the
list of switch between HP-Y and HP-Y. Whichever is visible ([X] or
[Y]) is the one that will be the HP sample assigned. Then simply click
on the magenta "*" to the left of the sample name for the sample you
wish to assign.
Using 'sets' of HP-X and sets of HP-Y
Multiple samples may be assigned to the to the HP-X or HP-Y
sets. These are assigned using the Edit HP-X and HP-Y 'sets'
of microarrys in the Samples menu. The multiple sets are
enabled by setting the Use HP-X and HP-Y 'sets' else single
samples checkbox in the Samples menu. Then, when
statistical calculations are performed on that data, it will use the
means, std-deviations, etc. from each of these sets rather than
individual samples.The HP-E sample list for computing expression profiles
You may cluster sets of genes with similar expression profiles across
a set of hybridized samples. The set of HP samples used in doing these
profiles is specified by Edit expression profile 'list (HP-E)
in the Samples menu. The Choose HP-X, HP-Y, and HP-E
command may also be used for defining the members and order of the
samples in the HP-E 'list'. Then, gene intensity expression profiles
may be created in a popup window for hybridized samples in the HP-E
set by using the Expression profile plot commands in the Plot menu.
Several of these plots may be created on the screen at the same
time. Clicking on a vertical data line in the plot will show the name
of the HP, its intensity and coefficient of variation (CV) of the
(F1,F2) data for this gene. Note that you can order the hybridized
samples in the HP-E set by the order in which they are added.Data 'Filters' - the intersection of one or more data tests
A set of genes may be computed by taking the intersection selected
gene sets. These sets are determined by various logical, data range
and statistical tests. Genes passing each test are assigned to a
gene subset which in turn are used in the gene intersection
computation. The final gene subset is used in array, plots, and
reports, and subsequent data filtering. Changing any test parameters
causes the data filter to be re-computed.1.4 Exploratory data analysis - overview
MAExplorer may be used to perform various data explorations by looking
for patterns correlated with different sets of hybridized samples or
with expression profiles of genes. This is discussed in more detail
throughout this manual and later in Section 3 on Exploratory Data
Analysis. Detailed descriptions of all commands are given in Section 2 Menus. 1.4.1 Saving the state of a data-mining session in stand-alone mode
If you are running MAExplorer in stand-alone mode, you may save the
state of your session for later use using the "Save DB" or "SaveAs DB"
commands. Then, the checkpointed database could be accessed using the
"Open file DB command". It currently saves: the gene sets, condition
(HP) lists, current HP-X, HP-Y and HP-E lists, data Filter options and
slider value settings, display options, clustering options,
normalization options, etc. We recommend using the "SaveAs ... DB" so
you can save the state under a different name rather than overriding
the original state. This way you could backup to the original state if
you wanted to. The "SaveAs DB" and "Open file DB" commands are
described in the File menu.
1.4.2 Logging messages and command history
Often a user would like to review measurements of particular genes and
to review the list of commands they issued (also called the command
history). Various data measurements as well as many other types of
information in the three text lines in the status area of the main
window may optionally be recorded in a popup message log (Section
2.5.1) and the command history may also be reviewed in a separate
popup message log (Section
2.5.2). If you are running the stand-alone version, the logs may
be saved. Otherwise, you could cut and paste the log data into
other word processing applications.
1.5 Quick start - demonstration of MAExplorer
MAExplorer is used as a stand-alone application. You may download the stand-alone
application (see Appendix D). This download also include a demo data set
of 50 hybridized samples from the public MGAP database. In any case, you can
explicitly download the data at any time at
http://www.lecb.ncifcrf.gov/mae/MGAP-Array-database.zip or
HREF="http://prdownloads.sourceforge.net/maexplorer/MGAP-Array-database.tar.gz?download">
http://prdownloads.sourceforge.net/maexplorer/MGAP-Array-database.tar.gz?download
Exiting MAExplorer
If you are in MAExplorer and want close the program and exit, you may
use the Quit command in the File menu or click on the
"close application" button (found in the upper right hand corners of
MAExplorer windows put there by your operating system). 1.6 Tutorials for using MAExplorer
There are a number of things you may do in this data mining facility.
We wrote two tutorials to help you understand its capabilities. We
recommend you first try the
short tutorial before attempting the advanced tutorial. The latter
demonstrates some of the more advanced capabilities. 2. MAExplorer menus
A MAExplorer analysis is performed using various interactive
controls. These commands are selected from pull-down menus located on
the "menu bar" at the top of the main MAExplorer window. The primary
menus are:
2.4.1 GeneClass - select gene subset for gene class data Filter
2.4.2 Normalization - select gene intensity normalization mode
2.4.3 Filter - select data filters to compute gene subset of interest
2.4.4 Plot - pseudoarray image, scatter, histograms, expression profile popup plots
2.4.5 Cluster - perform cluster analysis on data filtered genes
2.4.6 Report - generate popup spreadsheet reports of genes and samples
Menu notation
In the following menus, selections that are sub-menus are
indicated by a ''. Selections prefaced with a '' and indicate '' indicate that the command is a checkbox
that is enabled and disabled respectively. Checkbox menu items
have a "[CB]" at the end of the command. Selections prefaced with
a '' and indicate '' indicate that the command is a
multiple choice "radio button" that is enabled and disabled
respectively, and that only one member of the group is allowed to be
on at a time. Radio button menu items have a "[RB]" at the end of
the command. The default values set for an initial database are
shown in the menus. Selections prefaced with a '#' indicate
that the commands are available only when MAExplorer is run in the
stand-alone mode. Selections prefaced with a '*' commands
requires access to the backend Web server [Future]. Selections that
are not currently available will be grayed out in the menus of the
running program.
2.1 File menu
The File menu operations includes options and submenus
providing access to database data from disk and Web servers, state
saving, and groupware to share states between collaborators.
2.1.1 Databases menu
The Databases submenu is currently available only in stand-alone
mode and contains the following selections for opening and saving
databases.
2.1.2 Exploratory state menu
The Exploratory 'state' submenu contains the following
selections for saving and later using the user's state of the
exploration. If MAExplorer is being run on a local computer, no login
is required.
2.1.3 Groupware facility for sharing user states menu [Future]
The groupware facility allows users to share their state data with
other users. However, If MAExplorer is being run on a local computer,
then groupware may not be available since it depends on using specific
Web servers with MAExplorer-specific groupware services available.
[We first developed these concepts in the context of 2D protein
gels. They include: WebGel (Lemkin
et al., 1999b) for Internet exploration of 2DE databases, Xconf (Lemkin et al., 1993) an early
X-windows based image conferencing similar to CU-See-Me or ALO Instant
messenger for sharing images in a conference over the net, Flicker (Lemkin, 1997) an image
comparison over the internet using a a Java applet, and GELLAB-II (Lipkin and Lemkin, 1981)
a system for data mining - see
GELLAB-II Poster on the Web) that embody many of the concepts used
in MAExplorer.]
The Groupware submenu contains the following selections:
Saving the state and databases on the local file system
The current state of an exploratory data analysis session may be saved
on the local file system. The state consists of: gene sets;
HP-X and HP-Y hybridized sample sets; HP-E hybridized sample
lists; thresholds and switch setting preferences; etc. The user may
save their current state and restore a previous state at any
time. Restoring the state will overide the current database.
Groupware sharing of intermediate exploratory results [FUTURE]
Each registered user will be able to save the current state of their
exploration of the data in named User State files on the
back-end Web server using the Save user's state command. The
user may keep multiple named states on the back-end secure server
where they be accessed to restore the state at a future time using the
Open user's state command. The user can request a list of their
states with the Directory of user's states command. They may
remove a particular state with Delete user's state. 2.2 Samples menu
Each experimental condition sample is represented by a hybridized
sample (abbreviated HP in MAExplorer). The Samples menu
operations include operations to select the current hybridized sample
or samples. The simplest model of a MAExplorer analysis assumes (at
least) two microarray hybridized samples variables HP-X, HP-Y whose
data may be plotted against one another or compared. The default is a
single HP-X sample and a single HP-Y sample.
Figure 2.2.1 shows setting the HP-X, HP-Y, HP-E lists of samples using
the "Chooser" - the preferred method. Figure 2.2.2 shows setting the
HP-X sample from the menus. Figure 2.2.3 shows changing the current
HP-X sample by clicking on a sample name in the microarray image.
2.2.1 Selecting hybridized samples with Chooser or pull-down menu
sample lists
The Set Samples from lists submenu lets you define HP-X, HP-Y,
HP-X 'set', HP-Y 'set', or HP-E 'list' from lists of samples. It
contains four submenus:
The Edit HP-E expr. profile 'list' by source menu allows the user to
define an ordered list of samples for use in expression profile
statistics. Then, an expression vector of normalized quantification
values (one for each sample in the HP-E list) is computed for each
gene. Note: to place the samples in a particular order, start with an
empty HP-E set and then add them in the order you desire.
The Convention for pull-down menu sample selection lists
We use a common sample selection scheme when selecting a sample from a
pull-down menu list. This sub-sample "By Source" option is only
available if your database was set up to allow sub-sample source names
in the Samples database.
2.2.2 Swapping selected samples' (Cy3,Cy5) channels in ratio data
dye-swap experiments
The Edit use (Cy5/Cy3) else (Cy3/Cy5) for each HP command may
be used to selectively swap (Cy3,Cy5) data entries so the user may use
the samples (carefully, since gene labeling efficiency is not always
symmetric!) dye-swap data for replicates. This is only available for
ratio data. It swaps the data contained in MAExplorer (memory only) so
that the Cy3 data is swapped for the Cy5 data. For example, consider
the case where there are two materials A and B hybridized in two
experiments and labeled as follows: E1 (A=Cy3,B=Cy5) and E2
(A=Cy5,B=Cy3). Then, assuming uniform symmetric labeling (which is
generally not the case - although it might be true for a subset
of genes), then one might average data from E1 and E2 if the data from
E1 (or E2) were swapped. This is shown in the following figure.
2.2.3 Viewing sample HP-X, HP-Y, and HP-E partitions
You setup sets of HP samples for the HP-X and HP-Y sample sets and
HP-E expression list of samples using the Chooser (above). The current
contents of these lists may be viewed using the List HP-X &
HP-Y sample 'sets' to list the samples in the HP-X and HP-Y
'sets'. The List HP-E sample 'list' may be used to list the
samples in the ordered HP-E 'list'.
2.2.4 Defining sample condition 'class' names
When using sets of conditions, the HP-X and HP-Y 'sets', you will
probably want to assign meaningful names to these sets. The commands
Define HP-X class name and 2.2.5 Toggling between single HP-X (-Y) samples and HP-X (-Y) sets
When MAExplorer first starts up, it assumes that you wish to treat the
data as single samples so that HP-X and HP-Y are assigned to single
samples. However, if you want to work with sets of multiple samples
then you must toggle the state using the Use HP-X & HP-Y 'sets'
else single samples [CB] check box command. This toggles the state
between treating the data as multiple samples (HP-X and HP-Y 'sets')
or as single HP-X and HP-Y sample samples.2.2.6 Create and edit named condition lists of samples
The command Choose named condition lists of samples lets you
define new or edit existing named lists of hybridized samples called
"Condition lists". Associated with each condition list is a set of
annotation parameters to document the condition. The condition lists
may be used in the (Edit | Sets of conditions) operations. Among other
operations, you may assign any condition list to the working HP-X
'set', HP-Y 'set', or HP-E list of samples used through
MAExplorer. The last condition list that was edited with the (Sample
menu | Choose named
condition lists of samples) is called the "current condition" that
could be used in various operations. Figure 2.2.5 shows a screen
illustrating a popup condition chooser session where the legend
describes the options.
2.2.7 Create and edit named ordered condition lists (OCL) of conditions
The command Choose ordered condition lists of conditions lets
you define new or edit existing named ordered lists of conditions
called "Ordered Condition Lists" (OCL). Associated with each ordered
condition list is a set of annotation (name, value) pairs to document
the condition. The last condition list that was edited with the
(Sample menu | Choose
ordered lists of conditions) is called the "current OCL". The
current OCL is used by the (Filter menu | Filter by current
Ordered Condition List (OCL) F-Test [p-Value] slider [RB])
test. Figure 2.2.6 shows a screen illustrating a popup
ordered condition list chooser session.
2.3 Edit menu
The Edit menu operations include operations to modify the
'edited gene list' that is set from a variety of Filters as well as
manually this menu. The user may perform set operations (union,
intersection, and difference) on named sets of gene and sets of sample
experiments (conditions). User preferences are also set in this menu.
First compare two HPs using a statistical test such as a
t-test. Then save the resulting set of genes under the name "virgin
vs. pregnancy". Then compare the next two HPs and save the resulting
genes under the name "lactation vs. involution". Finally, compute
the difference of genes found in "virgin vs. pregnancy" that are not
found in "lactation vs. involution". This resulting gene set could
then be saved (e.g. with a name "Genes found in virgin
vs. pregnancy, but not in lactation vs. involution"). Similarly,
taking the intersection of these two named sets shows genes that are
common between the two sets. Taking the union shows genes found in
either of the two named sets.
2.3.1 User edited gene list - the 'Edited Gene List' menu
You may define and edit arbitrary sets of genes using the User
edited gene list submenu to modify the 'Edited Gene List'
(EGL). This has sub-modes of operation for adding or removing genes
from the image by clicking on spots. If the Show 'Edited Gene
List' mode is set, you may see exactly which genes you have
defined by the magenta squares drawn around each gene in the EGL. Many
of the clustering operations will leave the current cluster in the
EGL. The commands include:
This gives you the functionality of adding and deleting genes from a
user defined list of genes to be analyzed. The EGL may be used with
the gene-set operations discussed in Section
2.3.2. You may also define genes in the EGL using the "Gene Name
Guesser" shown in Figure 2.3.1.
2.3.2 Sets of genes menu
These commands let you do comparisons of sets of genes generated under
different criteria. In addition, you may compute derived gene sets
from existing gene sets using set operations (OR, AND,
DIFFERENCE). You may also normalize the data by a gene subset. The
user may save the genes defined by: 1) by the Filter, or 2) the
manually defined 'Edited Gene List'. The gene set resulting
from a binary gene set operation OR (union), AND (intersection), or
DIFFERENCE are saved in a new named gene set. The set difference (A-B) is
defined as the gets in set A that are not in set B. Genes in set B
that are not in set A are ignored. The 'User Filter Gene Set' may be
set to any gene set and may then be used as part of the gene Filter
cascade. The 'User Normalization Gene Set' may be set to any gene set
and may then be used to normalize gene intensity values across
hybridized samples. (See normalization
algorithm for more information on this method.)
User Gene Sets
Set# |#genes| title
=======================
#1 |1727| ALL GENES
#2 |394| ALL NAMED GENES
#3 |246| ESTs similar to genes
#4 |456| ESTs
#5 |1096| All genes and ESTs
#6 |1681| Good genes
#7 |40| Replicate genes
#8 |0| HousekeepingGenes
#9 |96| Calibration DNA
#10 |77| Your plates
#11 |46| Empty wells
--------- User Assignable ----------
#12 |0| User Filter Gene Set
#13 |60| Edited Gene List
#14 |0| Normalization Gene Set
--------- User definable------------
#15 |60| The 60 genes closest to Carbonic Anhydrase-III
#16 |30| Named genes in the 60 genes closest to CA-III
#17 |4| Replicate genes in the 60 genes closes to CA-III
2.3.3 Sets of sample conditions menu
In addition, MAExplorer can operate on sets of hybridized samples. For
example, a sample set might be replicate hybridized samples from the
same biological experiment sample, or it could be repeated experiments
of different but the same types of samples. (One must be careful in
mixing data between the two cases because of the different expected
sources of variance). This means you can treat multiple replicate
samples as a distribution and compare the mean values for each gene in
one set of samples with the mean values for another set of samples. We
call these sets of hybridized samples conditions lists or HP
lists. You may then put one or more HP samples into a condition
set. These sets in turn can be used for computing statistics on
clonal differences between different condition sets. Note each
condition set may have multiple (i.e. different) samples. These
condition sets are saved with the user state when doing a
(File | Databases | SaveAs DB). As with sets of genes, there are a
number of operations to manipulate HP condition set in the Sets of
Conditions menu that includes:
Condition Lists
===============
Condition[1] #HPs 2, [Initial HP-X: C57B6 pregnancy day 13]
Condition[2] #HPs 2, [Initial HP-Y: Stat5a (-,-) pregnancy day 13]
Condition[3] #HPs 4, [Initial HP-E expression list]
Condition List #1 [Initial HP-X: C57B6 pregnancy day 13]
====================================
HP[1] Pregnancy 13 (1 hr) [C57B6-p13-totalRNA5ug]
HP[2] Pregnancy 13 (1 hr) [C57B6-p13.2poly-A]
2.3.4 Setting user preferences menu
The Preferences submenu is used to set various data labels,
statistical limits and other parameters. These include:
2.4 Analysis menu
The Analysis menu (see
Figure 1.4) contains an ordered list of six primary menus that
may be used, in that order, to perform an initial analysis. In more
complex analyses, the sequence of operations will vary and include
commands selected from other menus or will use these menus in
different order. The Analysis submenus are as follows:
2.4.1 GeneClass menu
A gene class (e.g. all named genes, ESTs, oncogenes, etc.) is a set of
genes that belongs to the class of genes in the universe of genes in
the particular microarray database. MAExplorer may restrict the set
of genes by "Gene Class" membership (currently includes All Genes, All
named Genes, ESTs similar to genes, Unknown ESTs, All genes and ESTs,
Good genes, Replicate
genes (i.e. with more than one copy of the gene in the array),
Calibration DNA, genes from user's plates). The additional Gene Class
list of names depends on its availability in a specific database.
Gene class
Rule for class membership
All genes
all genes on the array
All named
genes not starting with "EST"
ESTs similar to genes
genes starting with "EST,"
ESTs
genes with the name "EST"
Replicate genes
genes with multiple copies
Calibration DNA
genes using the configuration file name "calibDNAname" (optional - see
Appendix Table C.4.1 )
Your plates
clones using the configuration file name "yourPlates" (optional - see
Appendix Table C.5.1-C))
Empty Wells empty wells where no spot exists
on the array indicated by keywords "empty", "empty well" or
"EmptyWell" (optional - see
Appendix Table C.5.1-C) )
Good Genes
spots on the array where the GIPO QualCheck data was used and was
valid. If it was not used, then it assumes all spots are good. (optional - see
Appendix Table C.4.1 )
2.4.1.1 GeneClass ontology subsets [Future]
If the Set Gene Class subset were activated, it might
include categories such as the following. If the categories exist and
the data is made available to MAExplorer, then it is possible to
specify gene subsets by Gene Class name. It is the responsibility of
the database creator to define a mapping table supporting these named
subsets of named genes.
2.4.1.2 Simulating Gene Class ontologies using Gene Set
operations
You can effectively implement finding ontology subsets for Gene Class
subsets using the following procedure. The trick is to repeatedly
define an E.G.L. gene subset using the gene name guesser to find the
genes of interest and save it as a named gene subset. Edit out genes
you don't want. Then you would repeatedly do the OR of gene sets of
interest, saving the result as a new named set. Then doing the OR of
another gene set with the set you just created, etc.
Procedure
2.4.2 Normalization menu
The Normalization menu operations include operations to
normalize gene intensity data between hybridized samples. This is
critical in being able to compare samples because of differences in
amount of sample, labeling efficiency and variations in scanner
operation including gain and baseline settings. There are several
methods available including normalizing by Zscore, median, log mean,
Zscore of logs, calibration DNA, housekeeping genes, etc. The specific
microarray image
quantification is determined the image analysis program being used
to pre-process the arrays.
Note: although this set of normalization methods is limited, it is
adequate for some analyses of the data. We are in the process of
adding more normalization methods through MAEPlugin methods.
2.4.2.1 Intensity background correction
The background intensity data from the spot quantification programs
may be used to correct spot intensity. Background may be specified as
either a global value or on a per-spot basis. If the array images have
low background, then this may not be too much of a problem if no
background values are available.
I'ij = Ij - bkgrdHPi
Ratio computation for Cy3 and Cy5 data
For most MAExplorer operations, the intensity of a gene is generally
computed as the mean intensity of the spots (background corrected or
not) which duplicate that gene on the microarray. When working with
dual hybridized samples using Cye-3 and Cye-5-dUTP labeling that results
in green and red fluorescence, this can be used in self-normalizing
intensity for each hybridized clone array using the Cy3/Cy5 ratio. If
local background is available, then the ratio can be computed for HP h
and spot j as
(Cy3hj - BkgrdCy3hj) / (Cy5hj - BkgrdCy5hj)
2.4.2.2 Normalization between microarrays to allow comparison
The normalization of quantitative data is crucial when comparing data
between different microarray samples. There are a number of different schemes
possible. One is to normalize by the sum of known calibration,
housekeeping genes or other "constant expression" genes in the
microarray. Another is to sum the background corrected integrated
density for all spots in an array and to normalize individual gene
measurements by that sum. These methods are now described in more
detail. As the MAEPlugins
facility becomes available, we will be adding a number of more
sophisticated gene-specific normalization methods that take many of
the problems specific to microarrays into account.
Normalizing by scaled Zscore of intensity
The "normalized Zscore of intensity" method normalizes each hybridized
sample by the mean and standard deviation of the raw intensities for
all of the spots in that sample. The mean intensity
mnIi and the standard deviation
sdIi are computed for the raw intensity of 'Good
genes'. It is useful for standardizing the mean (to 0.0) and the range
of data between hybridized samples to about -4.0 to +4.0. When using the
Zscore, you compute Zdiff(erences) not ratios. The Zscore intensity
Zscoreij for intensity Iij
for HP i and spot j is computed as
Zscoreij = (Iij - mnIi)/sdIi,
and
Zdiffj(x,y) = Zscorexj - Zscoreyj.
Normalizing by the median of intensity
The "Median intensity" method normalizes each hybridized sample by
the median of the raw intensities of 'Good genes' for all of the
spots in that sample. It is a useful normalization to use when you want
to compute X/Y ratios between hybridized samples.
Imij = (Iij/ medianIi)
Normalizing by the log of median of intensity
The "Log median intensity" method normalizes each hybridized sample
by the log of median scaled raw intensities of 'Good genes' for all
of the spots in that sample. The value 1.0 is added to the intensity
value to avoid taking the log(0.0) when intensity has zero value. This
is a useful normalization to use when you want to compute X/Y ratios
between hybridized samples and compress the scale. Because we are computing
a log, we report the difference between HP-X and HP-Y as (X-Y) instead
of a ratio (X/Y).
Imij = log(1.0 + (Iij/ medianIi))
Normalizing by scaled Zscore of log intensity, standard deviation
The "Normalize by Zscore of log intensity, stdDev" method normalizes
each hybridized sample by the mean and standard deviation of the
logs of the raw intensities for all of the spots in that sample. The
mean log intensity mnLIi and the standard
deviation log intensity sdLIi are computed for the
log of raw intensity of 'Good genes'. Then the Zscore intensity
ZlogSij for HP i and spot j is
ZlogSij = (log(Iij) - mnLIi)/sdLIi
Normalizing by scaled Zscore mean absolute deviation of log intensity
The "Normalize by Zscore of log intensity, mean absolute deviation"
method normalizes each hybridized sample by the mean and mean
absolute deviation of the logs of the raw intensities for all of the
spots in that sample. The mean log intensity mnLIi
and the mean absolute deviation log intensity
madLIi are computed for the log of raw intensity
of 'Good genes'. Then the Zscore intensity
ZlogAij for HP i and spot j is
ZlogAij = (log(Iij) - mnLIi)/madLIi
By 'User Normalization Gene Set'
This method is useful a subset of genes have been determined to have
relatively constant expression across the set of samples. It
normalizes by the sum of intensities for a subset of genes defined by
the user in the 'User
Normalization Gene Set' (Section 2.3.2)using the gene set editing
commands. Normalizing by the sum of genes uses the
Igsij that is computed for microarray HPi
with intensities Iij for all genes j in
the gene subset.
Igsi = Sum (Iij)
genes j
i in HPi
Then, the normalized intensity I'ij is computed as:
I'ij = Iij/Igsi
By 'Calibration DNA' set
If a predefined set of calibration DNA genes are available on the
array, they may be used to normalize density values between the
samples. The calibration DNA genes are defined by special gene names
that are declared in the Configuration file using the 'calibDNAname'
parameter (see Appendix C Table C.5.1(C)). If
there is no calibration DNA, this entry is not used. The algorithm is
the same as "User Normalization Gene Set" (above), but the set is
predefined as the genes flagged as calibration DNA. For example, in
the MGAP database, these spots are the "mouse genomic DNA" spots so
the Configuration file entry would be calibDNAname="m.g. DNA".
Scaling intensity data to 65K
Another method "Scale intensity data to 65K" scales the maximum
intensity of each sample to 65K (the maximum intensity). Since the
raw scanned data is often 16-bits, it can have a maximum value of
65535 (216-1) and so this does minimum scaling. This method
may make it easier to view the data initially using the
pseudoarray image. However, it may not properly scale the data between
arrays and should probably not be used in quantitative comparisons.
No normalization
You may also want to look at the raw intensity (or Cy3 and Cy5
channel) data. Turning off normalization gives you the raw data read
into MAExplorer.
2.4.2.3 Using different normalizations to 'see' different data views
Changing the normalization method will sometimes make differences
between data sets more apparent. The following figure shows the same
data in two different scatter plots but with two different
normalizations.
2.4.3 Filter menu
The final set of genes presented for display, plotting, reports, etc.
is determined by a cascade of gene "data filters" that generate a
restricted gene set. The cascade is computed in real-time using the intersection of individual
criteria and tests selected by the user. Examples of Filter
criteria include: membership in a particular gene set, ratio
(HP-X/HP-Y) within a range, passing statistical tests such as t-tests
or F-test, etc.
Filtering using statistical test by your selecting a p-value
These tests will filter genes meeting
the test criteria if the resulting p-value of that test is <= the
value specified by the p-Value state slider. Only one test may be
active at a time. If you switch to a new p-value test, it will disable
the previous p-value test. If any of these tests are selected, it
will pop up the p-Value state slider window for you to set the
p-Value. There are two t-tests: one operating on duplicate (F1,F2)
data if available, and the HP-X,HP-Y 'sets' if they are defined. The
Kolmogorov-Smirnov test operates on HP-X,HP-Y 'sets' if they are
defined. The F-test operates on the current Ordered
Condition List (OCL) consisting of any number of condition lists each
containing at least 2 (replicate) samples/condition.
Filtering out genes with high replicate spot variation
The Spot CV filter mode submenu contains options to select how
the spot CV filter is to be applied. It computes the maximum value of
CV for all of the samples in the particular sample set specified. That
maximum value is then used for the spot CV filter test. Genes may be
filtered out having a large difference between spot quantification
values of corresponding duplicate spots. You may compute the
coefficient of variation CVj for the two values
(f1j and f2j for a particular
gene j.
CVj = 2|f1j-f2j|/(f1j+f2j)
If the database only has one field but replicate HPs, then you may use
the HP-X & HP-Y 'sets' CVj to filter the
genes. Then CVj values are tested against a CV
threshold slider value to eliminate genes with a high coefficient of
variation.
2.4.3.1 Data filtering using multiple gene data filters
Any or all of the data filters may be selected simultaneously. In
particular, if you select filters that use parameter threshold
scrollers, they will be added to a state scroller window (see Figure
2.3.4.1 for details to allow adjustment of ALL sliders
simultaneously). You may change various thresholds and see the effect
in real time. Note: some of the scrollers are more sensitive to low
values. Therefore, we set them to respond non-linearly with a more
precise vernier at the low end.
2.4.4 Plot menu
The Plot menu lets you display a pseudoarray image, scatter plots,
ratio and intensity histograms,and expression profile plots. The
pseudoarray image is displayed in the main MAExplorer window. All of
the other plots are displayed in popup windows. Depending on the
particular plot, multiple instances may be allowed. The Plot
submenus are:
2.4.4.1 Show microarray pseudoarray images menu
You may show the pseudoarray
image of the current hybridized samples using several
modalities. The grayscale pseudoarray image is generated from the
quantified spot data. If the data contains the actual spot positions
of the genes (as generated by the various array image quantification
program), the spots may be drawn using a scaled version of those
coordinates. Otherwise, a generic set of grids (and fields in there
are multiple fields) is synthesized to represent the spot
positions. Pseudoarray images may also be useful as an alternative
modality for displaying X/Y ratio or X-Y Zdiff data. If the
normalized intensities are the same, then the spot will appear as
black with the overall spot intensity depending on the spot
concentrations. High ratios and Zdiffs will be red and low values
green as shown in Table
2.4.4.1. The p-Value results of comparing a HP-X 'set' with a
HP-Y 'set' of samples can be displayed as a color spectrum pseudoarray
image.
.
.
.
.
.
.
.
.
.
Normalization
mode - RBG
bright green
.
.
dark green
Black
dark red
.
.
bright red
.
.
.
.
.
.
.
.
.
Normalization
mode - dichromasy
bright blue
.
.
dark blue
Black
dark orange
.
.
bright orange
<0.250X
0.307X
0.400X
0.571X
1.000X
1.75X
2.50X
3.25X
>4.00X
Ratio data
<-3.0
-2.25
-1.50
-0.75
0.00
0.75
1.50
2.75
>3.0
Zscore data
<-0.99
-0.742
-0.495
-0.247
0.000
0.247
0.495
0.742
>0.99
Zscore Log data
Clicking on a particular gene will report its specific quantification
and identification values (See
Section 3.3 on gene quantification). If the Enable display current
gene in popup genomic DB Web Browser option is set in the
View menu, then it will also pop up a Web browser with the
corresponding to the particular genomic DB data for that
database if it exists.2.4.4.1.1 Examples of microarray intensity data pseudoarray image
The relative intensity may be displayed for the current sample (last
HP-X or HP-Y selected) or two samples (HP-X or HP-Y samples or HP-X or
HP-Y 'sets' of samples). To show two samples side by side, enable the
Use dual HP-X & HP-Y Pseudoimage in the Show Microarray
submenu. To show averaged set data in the dual mode, enable the "Use
HP-X & HP-Y 'sets' option in the Samples menu. The grayscale
value reflects the current normalization mode.
2.4.4.1.2 Example of microarray ratio or Zdiff data pseudocolor image
The ratio (HP-X/HP-Y) or Zdiff HP-X - HP-Y) normalized intensity data
may be displayed as a pseudoarray image for HP-X and HP-Y, or HP-X and
HP-Y 'sets' of samples. To show averaged set data in the dual mode,
enable the "Use HP-X & HP-Y 'sets' option in the Samples menu.
The colors of the 9 scale boxes represent the normalized expression
ranges and is assigned according to the current normalization mode
listed in the table.
2.4.4.2 Scatter plots menu
Scatter plots include HP-X vs HP-Y intensity for comparing data
between HP-X and HP-Y samples (or sets if the HP-X and -Y sample
'sets' mode is enabled in the Samples menu - as is shown in Figure 2.2.1). You may
zoom into any area of the scatter plot as is shown in Figure
2.4.4.4.2(C). If there are duplicate spots for each gene, you may
plot F1 vs F2 intensity (or Cy3 vs Cy5 if using ratio data) for
comparing replicate data (or Cy3 and Cy5 ratio data channels) within
the same sample It will also compute the correlation coefficient for
the data and display it in the plot and in the message panel. The data
is the intensity values using the current normalization method. If you
are analyzing ratio Cy3/Cy5 data, you may compare Cy3 or Cy5 of the
HP-X sample against Cy3 or Cy5 of the HP-Y sample. If you are in
stand-alone mode, a SaveAs GIF button will also be
available. This saves the current plot as a full resolution GIF file
specified by the user in a popup file browser window.
rSq=0.974, n=1728, X(mn+-sd)=(4.477+-7.845), Y(mn+-sd)=(12.379+-24.810)
The Scatter plots submenu includes:
Scatter plots of data from multiple channels on the same sample
It is also possible to plot the separate channels within a single sample
against each other. For example F1 vs F2 in samples with replicate
data and Cy3 vs Cy5 in samples with separate ratio data channels.
2.4.4.3 Histogram plots menu
You may compare ratios or Zdiffs of data using the HP-XY ratios or
Zdiff command to display a ratio histogram of Filtered intensity
data from two samples selected from the Samples menu. The HP-XY
'set' ratio or Zdiff is used if there are multiple samples in the
HP-X or HP-Y sets, then the mean values in each of the sets is used in
the calculations. If there are duplicate spots for each gene, you may
plot the F1F2 ratio or Zdiff histogram of the F1/F2 ratios or
F1-F2 Zdiff values for normalized data for each spot in the currently
displayed sample. If you are in stand-alone mode, a SaveAs GIF
button will also be available to save the current plot as a full
resolution GIF file specified by the user in a popup file browser
window.
2.4.4.4 Expression profile plots menu
You may generate an individual expression profile plot (EP plot) or a
scrollable list of EP plots. The order list of hybridized samples to
plot are specified by the HP-E set. In the latter case, the genes are
specified by the data Filter.
2.4.5 Cluster menu
The Clustering menu lets perform various types of gene and condition
clustering operations. When you invoke a clustering operation it will
popup one or more windows and may modify the pseudoarray image. Some
of the popup windows include clustergram and dendrogram analysis plots
used with the hierarchical clustering.
Use of clustering to find patterns of similar gene expression
Clustering is a way of possibly finding co-expressed genes that
exhibit similar expression changes in a set of samples. Genes may show
similar co-expression, but that does not prove they are co-regulated
at the same point in a pathway - merely that measurements of those
genes in a particular set of experiments show similar
expression. However, identifying genes with similar expression for
which some information is already known about some of the genes may be
useful as a starting point to help figure out gene function and
possibly aspects of its pathways in cell function using additional
experiments and analysis.
Hint: when working with very large data sets with many samples, it
may be useful to pre-adjust the distance and/or number of clusters
threshold sliders to an approximate range using the (Edit Menu |
Preferences | Adjust all Filter threshold scrollers). This is because
once the clustering starts, it does not (currently) let you abort the
clustering to change the threshold value.
LSQdistij = Sqrt( Sum ( D'hj - D'hi) **2 ) / n
h in HP-E
i,j in Filtered genes, i not j
Let,
sumij = Sum( D'hj * D'hi ),
mni = (1/n)Sum( D'hi ),
mnj = (1/n)Sum( D'hj ),
sumSqi = Sum( D'hi * D'hi ),
sumSqj = Sum( D'hj * D'hj ),
then,
[sumij - n*(mni * mnj)]
rij = --------------------------------------------------------
[Sqrt(sumSqi - n*n*mni*mni) * Sqrt(sumSqj - n*n*mnj*mnj)]
h in HP-E
i,j in Filtered genes, i not j
Handling of hierarchical clustering of large numbers of genes -
problem with slow response
The hierarchical clustering algorithm uses a gene-gene floating
point (i.e. 32-bit) distance matrix of order N2 (for N data
filtered genes). This means that if you are experiencing a slow
response, this may be due to several factors some of which you may not
be able to control. You might:
2.4.5.1 Cluster genes with expression profiles similar to current gene
The Cluster genes with expression profiles similar to current
gene is used to find genes with similar HP-E expression profiles
as measured by the least square error that are less than the cluster
distance threshold. It pops up the "Cluster Distance" threshold
scroller. Then click on a gene in the microarray image. It then pops
up up a window with a list of the similar genes and their expression
profile distances to the current gene. Each gene that passes the
cluster distance threshold test is indicated in the image with a blue square where the size of the square is
proportional to its similarity. It also displays a sorted list of the
genes with the cluster distance in the cluster panel that was popped
up. On each lines is a series of '*****' - the more stars the higher
the similarity to the seed gene. This is a silhouette plot that
is used to display a sorted list of similar objects and is described
to that described in (Kaufman and
Rousseeuw, 1990).
Larger squares indicate that more genes are similar. You may
change the cluster distance threshold and it will update the display
and the list. In addition, the 'edited gene list' is set to the
subset of genes that belong to the current cluster.
2.4.5.2 Cluster counts of similar filtered genes by expression profiles
The Cluster counts of similar Filtered genes by expression
profiles command analyzes the set of all Filtered genes for the
expression profile defined by the HP-E samples. It counts the number
of similar genes for each Filtered gene and draws a
blue circle whose size is proportional to the number of genes
similar to that gene. After it analyses these genes it lists the
genes and their counts in the cluster panel. You may change the
cluster distance threshold and/or Filter parameters and it will update
the display and the list. If you click on a gene with a green circle, it will switch to single gene
cluster mode (with the blue squares).
2.4.5.3 K-means clustering' gene expression profiles for filtered genes
The K-means cluster gene expression profiles for Filtered genes
command searches the data Filtered gene list for the genes
(i.e. primary genes) with the N most orthogonal expression
profiles. It will start this recursive computation from the gene with
minimum distance to all other genes unless you have selected a
"current gene" with the mouse. All Filtered genes are assigned to the
nearest K-means primary node. The mean cluster vector is computed and
used as the new definition of the cluster center. If you set the "Use
median instead of mean for K-means clustering" option in the
Clustering submenu, it will compute the center as a median instead of
a mean (Bickel, 2001). K-means
clustering is described in (Sneath
and Sokol, 1973). A new K-means primary gene (i.e. gene for the
cluster center) is found that is closest to this new center. Then all
of the data Filtered genes are reassigned to the new cluster
centers. The mean+-stdDev of the within-cluster distance to its center
is computed. It then pops up a text window with an ordered report of
the Filtered genes illustrated by part of a report shown below. [This
is part of a report from a 38 sample MGAP database subset of 141 genes
from the set of named genes restricted by the CV data filter.] Note
that clusters where the "Similarity" data is plotted as a silhouette plot use
variable length strings of '****' is about the same for the entire
cluster (e.g. cluster #4) contain genes that probably belong together
in the same cluster. Clusters that do not (e.g. Cluster 6) probably
contain two smaller more robust clusters.
Cluster report for 6 K-means clusters with 141 genes being clustered.
The seed gene is [1248564] Jun-B oncogene.
Clone ID Similarity Cluster-# Distance-to-cluster Gene-Name
-------- -------------- --------- ------------------- ----------------
1248411 ************** 1 Cluster [26 genes] in cluster [distNext: 1.035] wiCdist:mn+-sd=1.223+-0.453 CV=0.371 Calpactin I light chain
1381592 ********** 1 0.448 Surfeit gene 4
1247956 ********* 1 0.706 Protein kinase, cAMP dependent, catalytic, beta
1381836 ******** 1 0.761 Prohibitin
1382325 ******** 1 0.771 M.musculus mRNA for C1D protein
1248270 ******** 1 0.775 Seven in absentia 1A
1247716 ******** 1 0.794 Lipoprotein lipase
1248184 ******** 1 0.847 Mus musculus bromodomain-containing protein BP75 mRNA, complete cds
1248564 ******* 1 0.864 Jun-B oncogene
1382667 ******* 1 0.888 SERINE/THREONINE PROTEIN PHOSPHATASE PP2A-BETA, CATALYTIC SUBUNIT
1382561 ******* 1 0.931 Mus musculus GTP-specific succinyl-CoA synthetase beta subunit (Scs) mRNA, partial cds
1248089 ****** 1 1.013 M.musculus RPS3a gene
1247780 ****** 1 1.088 Proprotein convertase subtilisin/kexin type 7
1247557 ****** 1 1.104 M.musculus L28 mRNA for ribosomal protein L28
1248321 ***** 1 1.278 Decay accelerating factor 1
1382751 **** 1 1.311 Clusterin
1382007 **** 1 1.357 Murine mRNA with homology to yeast L29 ribosomal protein gene
1382074 **** 1 1.390 Orosomucoid 1
1381963 **** 1 1.417 M.musculus mRNA for ribosomal protein L36
1248278 ** 1 1.658 HISTONE H3.3
1247630 ** 1 1.675 Procollagen, type I, alpha 2
1247865 * 1 1.837 Mouse beta-D-galactosidase fusion protein mRNA, complete cds
1382236 * 1 1.85 Caspase 7
1247833 1 1.882 Mus musculus radio-resistance/chemo-resistance/cell cycle checkpoint control protein (Rad9) mRNA, complete cds
1248535 1 1.953 M.musculus mRNA for selenoprotein P
1247702 1 2.157 Cytochrome C oxidase, subunit Va
1382282 ************** 2 Cluster [13 genes] in cluster [distNext: 24.199] wiCdist:mn+-sd=16.184+-6.667 CV=0.412 Max interacting protein 1
1382159 ********** 2 9.086 TRANSPLANTATION ANTIGEN P35B
1247854 ********* 2 11.002 Prolyl 4-hydroxylase, beta polypeptide
1247970 ******** 2 11.786 Mouse mRNA for osteoblast specific factor 2 (OSF-2)
1381663 ******** 2 12.948 Mus musculus vacuolar adenosine triphosphatase subunit A gene, complete cds
1382100 ******** 2 13.34 T-complex protein 1, related sequence 1
1248366 ******** 2 13.541 Mus musculus cytochrome c oxidase subunit VIIa-L precursor (Cox7al) mRNA, nuclear gene encoding mitochondrial protein, complete cds
1247568 ******** 2 13.762 Cathepsin D
1247872 ******* 2 14.015 Mus musculus endothelial monocyte-activating polypeptide I mRNA, complete cds
1382333 ******* 2 14.065 Stromal cell derived factor 5
1382008 ******* 2 15.985 Mus musculus FK-506 binding protein homolog (SAM11) mRNA, complete cds
1247724 **** 2 21.964 Glutathione-S-transferase, alpha 3
1247846 2 34.704 House mouse; Musculus domesticus kidney mRNA for Phosphatidic acid phosphatase, complete cds
1247945 ************** 3 Cluster [22 genes] in cluster [distNext: 11.979] wiCdist:mn+-sd=7.559+-3.347 CV=0.443 Mus musculus mRNA for DEDD protein
1247797 ********** 3 4.159 Mus musculus Btk locus, alpha-D-galactosidase A (Ags), ribosomal protein (L44L), and Bruton's tyrosine kinase (Btk) genes, complete cds
1382087 ********** 3 4.494 Cell division cycle 42
1247539 ********** 3 4.511 EST
1248212 ********** 3 5.009 Murine mRNA for integrin beta subunit
1248470 ********** 3 5.044 EST
1247521 ********* 3 5.299 Mus musculus mRNA for peroxisomal integral membrane protein PMP34
1381808 ********* 3 5.924 Mus musculus UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase-T3 mRNA, complete cds
1381970 ********* 3 6.285 Mus musculus thioredoxin mRNA, nuclear gene encoding mitochondrial protein, complete cds
1382168 ********* 3 6.343 N-terminal Asn amidase
1382704 ********* 3 6.36 Mus musculus N-myristoyltransferase 1 mRNA, complete cds
1248548 ********* 3 6.378 Mus musculus WDR protein mRNA, complete cds
1247564 ******** 3 6.652 Erythrocyte protein band 7.2
1248588 ******** 3 6.67 M.musculus BAP31 mRNA
1247541 ******** 3 6.690 Apolipoprotein D
1248462 ******** 3 7.322 Sterol O-acyltransferase 1
1248462 ******** 3 7.42 Sterol O-acyltransferase 1
1248521 ****** 3 9.121 Mus domesticus nuclear binding factor NF2d9 mRNA, complete cds
1382212 ****** 3 10.137 Thyroid autoantigen 70 kDa
1382270 ***** 3 10.529 Voltage-dependent anion channel 2
1248152 ***** 3 10.541 M. musculus mRNA for MAP kinase-activated protein kinase 2
1247678 3 19.431 Casein alpha
1247543 ************** 4 Cluster [44 genes] in cluster [distNext: 1.035] wiCdist:mn+-sd=0.439+-0.266 CV=0.606 RAS-related C3 botulinum substrate 1
1381923 ************ 4 0.158 Prolyl 4-hydroxylase, beta polypeptide
1382052 ************ 4 0.209 Trans-acting transcription factor 1
1247882 *********** 4 0.237 Mus musculus AMP activated protein kinase mRNA, complete cds
1248099 *********** 4 0.246 Mus musculus mitogen-responsive 96 kDa phosphoprotein p96 mRNA, alternatively spliced p67 mRNA, and alternatively spliced p93 mRNA, complete cds
1248351 *********** 4 0.251 Abl-interactor 1
1247540 *********** 4 0.255 Mus musculus mRNA for ZIP-kinase, complete cds
1248316 *********** 4 0.26 Mus musculus proteasome alpha7/C8 subunit mRNA, complete cds
1382671 *********** 4 0.264 Mouse MA-3 (apoptosis-related gene) mRNA, complete cds
1382014 *********** 4 0.277 Transcription elongation factor B (SIII), polypeptide 1 (15 kDa),-like
1247885 *********** 4 0.289 Mus musculus mRNA for ryudocan core protein, complete cds
1248294 *********** 4 0.292 Mus musculus thioredoxin-related protein mRNA, complete cds
1382066 *********** 4 0.306 Inhibitor of DNA binding 2
1248597 *********** 4 0.307 Lipocortin 1
1248591 *********** 4 0.324 Interferon beta, fibroblast
1248445 ********** 4 0.333 Mus musculus beta prime coatomer protein mRNA, partial cds
1247775 ********** 4 0.34 House mouse; Musculus domesticus male brain mRNA for ARF1, complete cds
1382750 ********** 4 0.340 Thymoma viral proto-oncogene
1247905 ********** 4 0.341 Monokine induced by gamma interferon
1381668 ********** 4 0.351 Mus musculus mitogen-activated protein kinase-activated protein kinase mRNA, complete cds
1381811 ********** 4 0.356 Protein tyrosine phosphatase, receptor type, D
1382031 ********** 4 0.358 Protease (prosome, macropain) 28 subunit, beta
1248345 ********** 4 0.363 Mus musculus alpha-methylacyl-CoA racemase mRNA, complete cds
1382555 ********** 4 0.364 Lysosomal membrane glycoprotein 1
1247820 ********** 4 0.367 Tight junction protein 1
1247598 ********** 4 0.374 Retinoblastoma 1
1247595 ********** 4 0.378 PROBABLE CALCIUM-BINDING PROTEIN PMP41
1381928 ********** 4 0.379 Mus musculus MRJ (Mrj) mRNA, complete cds
1248196 ********** 4 0.399 Max protein
1381691 ********** 4 0.423 SRY-box containing gene 17
1248225 ********** 4 0.434 Mus musculus heat shock transcription factor 1 (Hsf1) gene, partial cds
1248084 ********** 4 0.442 Mus musculus Supl15h gene
1247941 ********* 4 0.453 Fibroblast growth factor inducible 14
1381623 ********* 4 0.468 Stearoyl-coenzyme A desaturase 1
1248202 ********* 4 0.473 Mouse mRNA for PAP-1, complete cds
1382115 ********* 4 0.512 GLUTATHIONE S-TRANSFERASE GT8.7
1382044 ********* 4 0.515 Cartilage derived retinoic acid sensitive protein
1381636 ******** 4 0.567 Lymphotoxin B
1381920 ******** 4 0.569 Mus musculus mRNA for NEFA protein, complete cds
1247757 ******** 4 0.596 Granzyme B
1382094 ******** 4 0.609 High mobility group protein 1
1247545 ******** 4 0.638 Carbon catabolite repression 4 homolog (S. cerevisiae)
1247607 *** 4 1.188 POLYADENYLATE-BINDING PROTEIN
1247727 4 1.667 Malate dehydrogenase, mitochondrial
1248244 ************** 5 Cluster [19 genes] in cluster [distNext: 3.473] wiCdist:mn+-sd=4.273+-2.059 CV=0.482 CD80 antigen
1248534 ********** 5 1.648 Carbonyl reductase
1247764 ********** 5 1.776 H-2 CLASS II HISTOCOMPATIBILITY ANTIGEN, GAMMA CHAIN
1381933 ********* 5 2.345 Mouse rpS17 mRNA for ribosomal protein S17, complete cds
1381616 ********* 5 2.42 Mus musculus oral tumor suppressor homolog (Doc-1) mRNA, partial cds
1248232 ********* 5 2.486 Mus musculus putative glycogen storage disease type 1b protein mRNA, complete cds
1382644 ******** 5 2.717 Cyclin G
1248125 ******** 5 2.791 Histocompatibility 2, class II, locus Mb2
1247799 ******** 5 2.869 Mus musculus signal recognition particle receptor beta subunit mRNA, complete cds
1247708 ******** 5 3.024 Ephrin A1
1247932 ****** 5 4.235 Mus musculus (clone: pMAT1) mRNA, complete cds
1382515 ***** 5 4.668 ATPase, Na+/K+ beta 3 polypeptide
1248586 ***** 5 4.838 Mus musculus viral envelope like protein (G7e) gene, complete cds
1248198 *** 5 5.874 Mus musculus D9 splice variant 2 mRNA, complete cds
1381623 ** 5 6.224 Stearoyl-coenzyme A desaturase 1
1382086 * 5 6.885 Mus musculus (strain C57Bl/6) mRNA sequence
1247887 * 5 7.014 Mouse chromosome 6 BAC-284H12 (Research Genetics mouse BAC library) complete sequence
1247886 5 7.810 Cut (Drosophila)-like 1
1248303 5 8.094 Lipopolysaccharide response
1247621 ************** 6 Cluster [17 genes] in cluster [distNext: 19.157] wiCdist:mn+-sd=12.410+-3.024 CV=0.244 Mus musculus Lsc (lsc) oncogene mRNA, complete cds
1248050 ******* 6 7.407 Mus musculus C57BL/6J ribosomal protein S28 mRNA, complete cds
1247698 ******* 6 7.571 Adipocyte protein aP2
1248240 ***** 6 9.198 Mus musculus mRNA, complete cds
1247862 **** 6 9.844 Mus musculus Nmi mRNA, complete cds
1382162 **** 6 10.330 CAMP responsive element modulator
1248398 *** 6 11.007 Mouse mRNA for ribosomal protein S12
1248281 *** 6 11.143 M.musculus mRNA for histone H3.3A
1247852 *** 6 11.576 Twist gene homolog, (Drosophila)
1381991 ** 6 12.809 Prolyl 4-hydroxylase, beta polypeptide
1382753 ** 6 13.019 Mus musculus cleavage and polyadenylation specificity factor (MCPSF) mRNA, complete cds
1248368 * 6 13.639 Mus musculus ribosomal protein S26 (RPS26) mRNA, complete cds
1247639 * 6 13.692 SRY-box containing gene 4
1248435 6 14.262 Thymus cell antigen 1, theta
1247961 6 14.75 ATP SYNTHASE ALPHA CHAIN, MITOCHONDRIAL PRECURSOR
1248344 6 15.217 Gut enriched Kruppel-like factor
1382234 6 16.351 CD8 antigen, beta chain
2.4.5.4 Hierarchical clustering of expression profiles
The Hierarchical clustering of expression profiles computes the
hierarchical clustering of the expression profiles of data Filtered
genes and displays a clustergram and optional dendrogram.
Hierarchical clustering is described in ( Sneath and Sokol, 1973). The gene
data is normalized either by the corresponding HP-X sample data for
each gene or the maximum raw intensities for each HP sample in the
expression profile set by the Normalize by HP-X else HP's max
intensities menu toggle. There are three types of clustering
linkages: average-arithmetic-linkage,
average-centroid-linkage, and next minimum
linkage. These may be modified using the weighted average
that gives equi-weighting to the child clusters in computing the mean
of a new cluster, and un-weighted-average that weights them by
the number of non-terminal clusters. The average-linkage clustering is
very compute intensive and takes a while. The next-minimum-linkage is
much faster and may result in adequate clustering for some
situations.
.
.
.
.
.
.
.
.
.
bright green
.
.
dark green
Black
dark red
.
.
bright red
<1/8X
1/6X
1/4X
1/2X
1X
2X
4X
6X
>8X
The current gene may be set by clicking on a row that is then
highlighted in green. If you click on a colored box, it will also
report the HP name for that column and its normalized expression value
(highlighting that box with a white circle). If the Web genomic
databases are enabled (through the View menu, then it will also popup
a Web page for that gene). If you set the current gene in any of the
array, scatter plot, gene guesser, etc. displays, it will set it for
and position the clustergram at that gene. If the Dendrogram
checkbox is enabled, then a dendrogram is drawn to the left of the
clustergram boxes. Clicking on a region in the dendrogram sets a
distance threshold (displayed at the top) and displays all parts of
the dendrogram tree in red that have a cluster distance less than what
you defined. If the zoom nnX button is pressed, then the
of dendrogram drawing is magnified by nnnn-fold to make highly similar
clusters more visible. Pressing the button repeatedly cycles through:
1X, 2X, 5X, 10X, 20X. Sub-regions of the clustergram may be explored
in more detail using the EP plot button that pops up a
scrollable window of the ordered gene list. You may generate
multiple EP-subset plots so as to compare different parts of the
clustergram. A report of all of the ordered genes may be created
using the ClustGram Report button. The Show HP names
button pops up a numbered list of all samples used in the expression
profiles and clustergram. This report has all of the normalized
expression profiles on the right side of the report.
2.4.6 Report menu
Various reports summarizing gene or sample data may be generated and
appear in popup tables. These include:
2.4.6.1 Array report menu - hybridized samples global data
You may generate reports of sample array information. The first two
menu selections contain descriptive information about specific
hybridized microarrays samples. The "Extra Samples info" contains
quantitative and extra descriptive information (if available for your
database).
rSq=0.748, n=1656, HP:1(mn+-sd)=(28991+-19564), HP:2(mn+-sd)=(5044+-9766)
2.4.6.2 Gene reports menu
You may generate gene reports with various additional options. You
must set the Web access checkbox if you want to click on a blue
hyperlink in the resulting report to access an associated Web
database. In addition, specialized gene reports may be generated from
some of the cluster plot command windows. These include lists of
genes sorted by cluster (K-means cluster #), by hierarchical cluster order,
by similarity to a gene, etc. The mean cluster expression values may
be reported for K-means clustering.
2.4.6.2.1 Filtered gene reports menu
You may generate gene reports of Filtered genes with various
additional presentation options. In the highest/lowest N genes, N
defaults to 100 and is set by (Report | Table format | Set max # genes
in highest/lowest report) command.
2.4.6.3 Table format menu
The report is presented as a table. However, it may be visualized
several different ways. The scrollable spreadsheet includes the
ability to click on blue hypertext items and have a Web browser pop up
for that item on a Web database (e.g. GenBank, dbEST, UniGene,
LocusLink, mAdb Genes, GeneCard, etc). The tab-delimited option
enables you to cut the table and paste it into a separate spreadsheet
program such as Excel. You may also extend the data in the table to by
'Adding' expression profile ratios and statistics from the HP-X and
HP-Y 'set' comparisons.
2.4.6.4 Table font size menu
For wider tables, you can see more information if you use a smaller
font to display the table. The font sizes available are:
2.5 View menu
The View menu options are used to modify the view of genes
visible in the pseudoarray image. Genes may be displayed with
additional properties or capabilities including access to Web-based
genomic database entries for specific genes. Note that depending on
your particular database, if some genomic identifiers are not
available then the corresponding "Enable display current gene in
genomic DB Web browser" will not appear in the menu.
2.5.1 Logging MAExplorer messages
MAExplorer shows various data measurements as well as many other types
of information in the three text lines in the status area of the main
window. The Show log of messages pops up a scrollable log of
all messages to the three line status area. This is useful for
recording measurements and other activity. The messages may be saved
in log file (typically maeMessages.log). Figure 2.5.2 shows an example
of the messages popup log window. Clicking on genes in the pseudoarray
image or in plots will log the gene data (see Section 3.3) given the
current normalization, Samples use (single or multiple), and
pseudoarray display mode. The current values of all of the State
Threshold scrollers are saved in the message log when the (Edit menu |
Preferences | Adjust all Filter threshold scrollers) State
Thresholds popup window is closed. This is useful for capturing the
current settings at any time.
2.5.2 Logging command history
During a datamining session, the user will typically execute many
commands from the menu as well as clicking on genes in the pseudoarray
image or in plots. It is useful to recording the steps you took during
this analysis. The Show log of command history pops up a
scrollable history of all commands issued to MAExplorer. The commands
are automatically numbered. The history may be saved in log file
(typically maeHistory.log). Figure 2.5.2 shows an example
of the command history popup log window.
2.6 Plugins menu
MAExplorer may be extended by users to use new analysis methods using
Java plugins. We call these new methods MAEPlugins which are small
Java programs written by users that may be dynamically loaded into
MAExplorer and then applied to their data. These plugins will include
plugins written by LECB, those written by academic or commercial
groups. See the MAEplugins for
details. If you have a Java compiled plugin in the form of either a
Java .class or .jar file, you may load it at run time using the "Load
plugin" command in the Plugins menu. If specified in the MAEPlugin, it
will be added to the appropriate menu in the MAExplorer menu tree at
the end of the specified submenu (see Appendix C. Table C.5.7). If
this submenu "stub" is not specified, it will place in the list of
plugins in the Plugins menu (e.g. plugin #1, ..., plugin
#n).
The Plugins menu includes:
RLO methods menu
This contains a list of executable R
analyses methods (called R LayOuts or RLOs created with the
RtestPlugin) for evaluating MAExplorer data with R analysis scripts.
It is only available if you have installed the R language program (www.r-project.org) on your
computer. An RLO analysis allows you to automatically export data from
MAExplorer, execute it with the associated R program, and import the R
results back into MAExplorer. [This is under development and is
alpha-level.] A recent poster on
Extending MAExplorer with R is available as a PDF file.2.6.1 Example of using a Plugin
This shows a short demonstration of what is involved in using a
MAEPlugin. The user first load the plugin from the disk. Generally the
plugins .jar or .class files are stored in the Plugins/ directory
where you have installed MAExplorer. Then they load a particular
plugin which installs it in the Plugins pull-down menu. Then they
revisit that menu to invoke the particular plugin. You may load any
number of plugins (until you run out of computer memory if that should
occur).
2.7 Help menu
Various on-line help and documents are available if you are connected
to the Internet. These will appear in a separate pop-up Web browser
window so you may view them while working with MAExplorer. This
includes on-line documentation (including this reference manual),
tutorials, and other information. This may be links to other Web pages
describing key areas of specific databases. For example, for the MGAP
database, the point back to key areas of MGAP including the MGAP Animal Models, Histology
atlas, etc. You can then use the browser's "Save as" and "Print"
options to save the data to a file or print it.
Database-specific help menu entries -
entries defined for a particular database (see below)
2.7.1 Adding custom help links to your database to the Help menu
These Database-specific help menu
entries list of entries are keyed to the database you are
using and may be
customized by the database maintainer in the configuration file
(Section C.5.6) to links relating to the particular database. For
example, database specific help for the MGAP database is:
3. Exploratory Data Analysis - Introduction to Data Mining
Data mining is the uncovering of relevant patterns of interest in data
from a particular problem domain (Tukey, 1977). Typically this
involves using various statistical techniques to identify the patterns
including cluster analysis. See
StatSoft Inc's, 2002 on-line statistics textbook for definitions of
clustering and other statistical terms. Researchers across a wide
range of fields such as (Tufte,
1997) and (Cleveland,
1985) have suggested that a major aspect of this problem is
finding the correct means of graphical presentation to allow humans to
be a part of the pattern recognition process. Tufte argues that the
proper display of quantitative data in the context of the problem
domain can aid in the understanding of complex sets of data. This
carries over to the analysis of microarrays with data mining involves
having statistical, genomic knowledge database, and graphical
components for success. (Jagota,
2001) discusses a number of methods and applications for
microarray data analysis and visualization. Other useful resources are
the sets of papers in (
"Chipping Forecast", Nature Genetics supplement, Jan,
1999), and (
"Chipping Forecast II, Nature Genetics supplement, Dec,
2002).
Organization of Sections in this Chapter
3.1 Objectives in data mining, discovery and analysis
There are a number of objectives an investigator has when analyzing a
set of data. The types of analyses and how useful they are depends on
what they wish to get out of the analyses as well as the type of
data.
Recording the analysis steps during your data mining session -
command history
Because of the iterative nature of this process, you might want to
keep a record of the commands you have used or the messages and
measurements you have made. To do this you need to enable message and
command history logging. Go to the View pull-down menu and then select
the type of logging you want using the Show log of messages or the
Show log of command
history commands.
3.1.1 Some experimental design issues of microarray experiments
*** THIS SUBSECTION IS IN THE PROCESS OF BEING UPDATED *** Comparing HP-X/HP-Y for Cy3/Cy5 data as 'ratio of ratios'
If we have two samples HP-X and HP-Y with a common reference sample P
(e.g. Cy5P), then we would be comparing the HP-X
"intensity" Cy3X/Cy5X against the HP-Y
"intensity" Cy3Y/Cy5Y. Alternatively,
you can label Cy3 as the common reference sample P in which case just
swap Cy3 and Cy5 in these equations. If you are using a common
reference standard (i.e. Cy5X1) is the same sample
as Cy5Y1 eg. a pooled sample
Cy5P, then
a) (Cy3X/Cy5X1) / (Cy3Y/Cy5Y1)
becomes
b) (Cy3X/Cy3Y)
However, this new comparison is accompanied by additional noise
because of use of the two Cy5P intermediaries.
(Cy3A/Cy5P), (Cy3B/Cy5P), ... , (Cy3N/Cy5P)
This assumes that there is enough of the pooled sample P to be used
for all of the experiments - otherwise additional sources of
error would be introduced. MAExplorer is ideally used with this
common reference sample P.
It a common pooled sample is not used, then the experimental
design becomes more complicated - especially if dye-swap experiments
are performed for all samples. For N samples taken 2 at a time
(i.e. Cy3 and Cy5), then the number of experiments may be impossibly
large to perform for other than a very small N. Eg. for N of 3, the
number of experiments is 3 and 6 if dye swap experiments are also
performed. For N of 4, the number of experiments is 6 and 12. And
this is without doing any replicate experiments. If a reasonable
number of replicates is added, then this set of experiments becomes
even difficult to perform.
[(Cy3X/Cy5Y) + 1.0/(Cy3Y/Cy5X)]/2
In general, this is probably not a very good estimate.
3.1.2 Design philosophy of MAExplorer methodology
There are several ways to implement a data mining system on moderate
size databases. The first is that all computations are performed on a
Web server and the user's Web browser displays the results. The second
is download an applet from the Web server, get the data from the
Web server and do computations in the Web browser. A third way is do
download data from a Web server and run a local stand-alone program on
the data. MAExplorer can be run using both the second and third
ways. However, we encourage the use of the stand-alone paradigm as
having the best bandwidth and being the most robust.
The browser-based computation paradigm (as opposed to server-based) is
somewhat unusual. It keeps both the program and data on the server,
making user maintenance of the latest versions easier than if they had
to constantly upgrade the program or data. This also has the distinct
advantage of giving the user instantaneous feedback through rapid
visual and tabular views and the ability to more effectively navigate
the data since the analysis is done on their desktop computer. Because
it is easy to access reference data from other genomic sources
(e.g. UniGene, GenBank, NCI/CIT's mAdb clone DB, dbEST, GeneCard,
etc.), it can be accessed from their respective Web servers as needed.
Complex browser-based computations are used in other data mining or
intensive computation domains. With the increased bandwidth of the
Internet and compute power and memory of PCs approaching the Cray
supercomputers of the previous decade, this paradigm becomes even more
feasible. However there are limits to how well it scales because of
Web browser limitations. Appendix E.2
discusses these issues in more detail3.1.3 Evolution of MAExplorer from earlier proteomic data mining systems
MAExplorer was designed to do flexible exploratory quantitative data
analysis of gene data from microarray hybridized sample experiments.
Many of the data-mining concepts are derived from a system called GELLAB-II
(http://www.lecb.ncifcrf.gov/lemkin/gellab.html) that is a UNIX-based
stand-alone exploratory data analysis system for 2D protein gels over
multiple experiments (Lipkin and
Lemkin, 1981), a review (Lemkin
and Lester, 1989) and examples of graphical representations of
this type of data (Lemkin,
1995). An on-line
GELLAB-II Web-Poster
(http://www.lecb.ncifcrf.gov/lemkin/gellab-ep93wd.html) is available
showing various screen shots of GELLAB-II in action. Whereas GELLAB
works with sets of corresponding spots (i.e. proteins) across sets of
2D gel samples, MAExplorer works with sets of genes (spots in the
microarray) across sets of hybridized sample microarrays. With
protein gels, one typically has spot alignment problems since gels are
generally not superimposable. This is often called the rubber-sheet
distortion problem and requires localized alignment of spots based of
neighboring spot constellation morphology. We have used Web-based
visual methods to visually compare gels including the Flicker
(http://www.lecb.ncifcrf.gov/flicker/) image comparison system a Java
applet, (Lemkin, 1997), and the 2DWG
(http://www.lecb.ncifcrf.gov/2dwgDB/) meta-database of 2D gel images,
(Lemkin, 1999a). Since the genes
are precisely spotted on the arrays, aligning spots between arrays is
not required and greatly simplifies that the data analysis problem.3.1.4 Concepts used in data mining with MAExplorer
This section introduces some of the concepts used in data mining
microarrays with an emphasis on how they are used with MAExplorer.
Gene data filters - a Boolean AND of gene set tests
A primary MAExplorer concept is that of gene data filter that selects
a working set of genes by the conjunction (Boolean AND) of user
selectable tests. Each test further restricts the working set of genes
to those meeting the test. These criteria include gene membership in
particular gene classes, membership in particular user defined or
computed gene subsets, and meeting a variety of statistical
constraints. Statistics include intra- and inter-array CV, X-Y sets
t-tests. Range test criteria include X/Y ratio ranges and histogram
bins, intensity ranges and histogram bins. Membership criteria
include test if genes are in the current-cluster (derived from
cluster-analysis), gene set membership, etc. By selectively including
one or more of these filter restrictions, the user can home in on the
data that appears to be interest. Of course as in real mining, what
appears interesting may not be interesting based on further
investigation.Set operations on gene subsets
Because of the complexity of comparing many different replicated
samples, it may be difficult to manually organize the resulting
comparisons. MAExplorer offers set-theoretic operations on sets of genes and sets of hybridized samples
(i.e. intersection, union, difference) to help with this organization
(step 9 in Table 3.2). The results
of set operations may be saved and used in subsequent set operations,
normalization, as well as with the data filter. This is useful when
comparing and documenting procedures, methods, and analyses from
several subsets of experiments. User exploration states
Users needs to be able to save and restore the current state of their
explorations of the data and option settings to document and continue
at later times. When running in stand-alone mode, the user
may save their data mining session on the local disk as in named (.mae
file extension) startup files. Clicking on one of a startup file will
restart MAExplorer and restore the state to that of the time it was
saved. In addition to filter and parameter status, the HP-X, HP-Y,
HP-X and HP-Y 'sets', HP-E 'list', the named gene sets and HP
condition sets are saved as part of the state User groupware sharing of exploration states with collaborators
[In the future], these could be saved on a public Web server using
multiple named state files. These are protected for the user using a
login procedure. A groupware sharing of these intermediate
exploratory results is available when they allow another user to
access selected states. User states and groupware sharing complete
step 11 in the analysis described in Table 3.2. 3.2 Steps in data mining, discover, and analysis
An analysis scenario may use many methods for viewing the data. A
typical sequence of analysis steps is listed below in Table 3.2 in the
order they might be performed. Note that this is a rough guide
for a possible analysis and the iteration and backing up for
of some of these steps is required for data mining complex sets of
conditions, especially in the setting of constraints for the "data
filter" (step 4) when the user focuses on subtle patterns of interest
(c.f. Figure 3.1).
Scatter plots are useful for visualizing data from two conditions
The scatter plot method (step 5) allows the user to plot the intensity
data between two samples, the X-sample and the Y-sample. Gene data
may be spot data for two different samples (HP-X and HP-Y), means of
two different sets of hybridized samples of replicate samples (sets of
HP-X and sets of HP-Y), or the left and right normalized replicate
data (F1 vs. F2) for the current hybridized sample. If Cy3/Cy5 data is
used, then each sample is the ratio of data from two different
hybridized samples. So if we have sample Cy3a and Cy3b then HP-X could
be Cy3a/Cy5 and HP-Y could be Cy3b/Cy5 such that we are scaling the
Cy3a and Cy3b samples using a common Cy5 normalization sample.
Scatter plots are useful for obtaining a better understanding of the
outliers when comparing different hybridized samples and determining
the reproducibility of spotting when comparing F1 vs. F2 data or
replicate sample data.Filtering genes by histogram plots of ratios, Zdiffs or intensities
Histogram plots may be generated from either X/Y ratios or (X-Y)
Zdiffs of two different hybridized samples (single samples or X and Y
replicates) or from the F1/F2 intensities of a single hybridized
sample. Selecting a bin in a histogram restricts filtered genes to
those that are contained in that histogram bin. As an alternate
method, data filtering by ratio (Zdiff) or intensity range may
be used with adjustable range scrollers independent of the
histograms. However, histograms and scrollers may be used together.
For example, one could filter by the ratio histogram after filtering
out genes with low-intensity values that may be considered noise using
the intensity sliders. That might help eliminate falsely high ratios
resulting from dividing high X values by a very small noisy Y values.
Histograms are useful for getting a better understanding of the range
and distribution of the gene intensities or ratios. Expression profile plots (EP-plot) of N conditions for viewing
time series, etc.
List HP-E is an ordered list of samples - as different from HP-X and
HP-Y that are unordered sets of samples. The expression profile (step
5) of a gene is the plot of its normalized intensity as a function of
the samples in the ordered HP-E list. It may be plotted for the
current gene in a pop-up window. Selecting a different current gene
causes the EP-plot to be displayed for that gene. Multiple EP-plots
may be created to view the differences between a few genes you are
investigating further. The HP name button pops up a window with
the ordered list of samples so you can see the details of the sample
names being plotted. Selecting a line in a plot displays the intensity
data and sample name for that hybridized sample. The data may be
plotted as a bar, point or continuous curve and error bars may be
turned off to better compare multiple plots.Finding clusters of genes with similar expression profiles:
similar, cluster counts, K-means, and hierarchical methods
We may define a cluster of genes as a set of genes whose expression
profiles are found to be similar (step 6). The samples used in
computing the expression profiles are specified by the HP-E ordered
list. You can scale the list of normalized intensity data for each
gene to 1.0 (resulting in finding genes with similar shaped
EP-plots). Alternatively, if you don't scale this data it will
cluster more on magnitude changes. You can select either the Euclidean
distance or the correlation coefficient of the EP lists between two
genes as the measure of gene-gene distance. Similarity is 1.0 -
normalized distance.Gene reports: dynamic spreadsheets for Web access or
tab-delimited for exporting data to Excel
Pop-up report windows (step 7) may be generated for either individual
genes or a global array sample data. Instances of the latter include
experimental information and Web links, global statistics, correlation
coefficients between array samples, etc. Gene reports may present
this data in a number of ways. These include: highest/lowest gene
ratios, profiles, parametric and cluster statistics, etc. Reports may
be presented as either dynamic Web-interactive spreadsheet tables or
as static tab-delimited tables. The latter is useful for exporting
data using cut and paste into Excel (step 10). If the user clicks on
a blue hyperlinked cell in a dynamic spreadsheet table, it pops up
another Web browser window and loads it with data (step 8) from the
respective Internet genomic database such as mAdb Clone DB, UniGene,
GenBank, dbEST, GeneCard, and MGAP model and histology Web pages.Collaborative groupware environment
Having immediate access to collaborator's data is a powerful research
tool. A collaborative environment is being implemented for MAExplorer
that allows groups of users to share data and intermediate
results. These capabilities include: 1) the ability to save and
restore exploratory data mining sessions (states) through the Web
server including named sets of genes, and 2) to selectively share
these states with collaborators. The latter process is sometimes
called a groupware environment because if offers a collaborative group
the ability to share and interact. These capabilities are modeled
after our WebGel system (Lemkin
et al., 1999b). In addition, users can create a custom database
Web page as a subset of samples from the entire database. This may be
saved on their own computer through their Web browser's "File/Save as"
command. This hypertext file could then be used at a later time to
access the database or be E-mailed to a collaborator to do the same.
3.2.1 Definition of expression profile
It is helpful to define an expression profile. There may be alternate
definitions, but the following is useful for getting an understanding
of how it might be computed. An expression profile
ej of an ordered list of N samples (k=1
to N) for a particular gene j is a vector of scaled
expression values vjk.
ej = (vj1, vj2, vj3, ..., vjN)
A difference between two genes p and q may be estimated as a
N-dimensional metric "distance" between ep and
eq. The Euclidean distance is then defined as
dpq = (1/N SUMj=1:N (vjp - vjp)2 )1/2
Other distance measures may include correlation coefficient,
city-block (or manhatten distance) etc.
spq = 1 - dpq
3.2.2 Clustering Methods
Clusters represent one way to identify similar gene expression across
a set of experiment samples. There are many ways to cluster the data,
some of which are available in MAExplorer. These include:
Other methods include Self Organizing Memory (SOM), fuzzy clustering,
Support Vector Machines (SVM), etc.
3.2.2.1 Clustering similar genes
If we have a particular gene s (the "seed" gene), we may want
to find a set of all genes {gj} similar to
gs. We can find this set of genes by testing
We define a particular gene gj as similar to seed
gene if the distance between genes s and j meets
the following criteria.
djs < T
The threshold T is set by the investigator and in MAExplorer
is changed using a slider. Typically, the set of all genes
{gj} found is sorted by similarity before being
viewed.
3.2.2.2 K-means clustering
K-means clustering finds K clusters of genes with similar expression
profiles to a given gene (see
Sneath and Sokol, 1973). Given the number of clusters K,
we could use high variance of clusters to determine if they should
split into sub-clusters. K-means clustering does not need a distance
matrix (see Hierarchical clustering which follows), so it is faster
and may cluster large numbers of N genes. However, it is
highly dependent on seed selection. It may be useful for getting an
initial estimate - especially if other techniques (such as silhouette
plots) are also used. The following is a simplified definition of one
way to compute a set of K-means clusters of gene expression profile
data.
Algorithm:
3.2.2.3 Hierarchical clustering
Hierarchical clustering of a set of genes will generate a binary tree
of clusters with the genes at the terminal ends of the tree and a
single cluster of the entire tree at the top (also called the root) of
the tree. See (Sneath and Sokol,
1973) for a discussion on hierarchical clustering. There are many
other variants of hierarchical clustering. Hierarchical clustering
requires a distance matrix or the equivalent of one [there are more
efficient ways to compute it]. ForN genes (terminal
clusters), it generates 2N-1 clusters. Distance matrix is
upper diagonal matrix D of dpq of size
N(N-1)/2.
Algorithm:
3.3 Display of gene spot intensity and identification data measurements
You may select the current gene by clicking in the pseudoarray image or in the X-Y scatter plot and
MAExplorer reports. The
microarray grid coordinates, normalized quantified spot intensity
data, plate coordinates, gene name (if known) and associated data for
that gene. If you are displaying a pseudocolor ratio (X/Y) or Zdiff
(X-Y) image, it will report HP-X/HP-Y or (HP-X - HP-Y) data
respectively. It also sets the gene as the current gene. The
pseudoarray image coordinates are reported as:
[<field>-<grid name><row#>,<col#>].
e.g.
[1-A4,3]
[1-A4,5] intensity=4.5267, (Norm.: median intensity)
CloneID: 1248228, dbEST3': 2279072, GenBankAcc3': AI463183, UniGene: Mm.13859, plate[5,A,5]
GeneName: Mus musculus ribosomal protein L41 mRNA, complete cds
b) Field F1 and F2 replicate spots for a single sample. The top line
is shown for each of the different normalization methods.
[1-A4,5] intensity[F1]=-0.3067, intensity[F2]=-0.2312, F1-F2=-0.0755, (Norm.: Zscore intensity)
[1-A4,5] intensity[F1]=4.5267, intensity[F2]=6.2408, F1/F2=0.7253, (Norm.: median intensity)
[1-A4,5] intensity[F1]=0.8755, intensity[F2]=1.1457, F1-F2=-0.2701, (Norm.: log median intensity)
[1-A4,5] intensity[F1]=-0.1442, intensity[F2]=-0.0945, F1-F2=-0.0497, (Norm.: Z-score, stdDev, log intensity)
[1-A4,5] intensity[F1]=-0.1533, intensity[F2]=-0.1004, F1-F2=-0.0528, (Norm.: Z-score, mean abs.deviation, log intensity)
[1-A4,5] intensity[F1]=630.9911, intensity[F2]=869.9273, F1/F2=0.7253, (Norm.: calibration DNA intensity)
[1-A4,5] intensity[F1]=1919.9376, intensity[F2]=2646.957, F1/F2=0.7253, (Norm.: scale to max. (65K) intensity)
CloneID: 1248228, dbEST3': 2279072, GenBankAcc3': AI463183, UniGene: Mm.13859, plate[5,A,5]
GeneName: Mus musculus ribosomal protein L41 mRNA, complete cds
If the "Pseudocolor HP-X/HP-Y ratio or Zdiff" option is selected in
the "Show Microarray" submenu, data is reported as either Ratio or
Zdiff data depending on the normalization method selected. The data
used in the following examples is for C57B6 pregnancy day 13 (HP-X)
compared with Stat5a (-,-) pregnancy day 13 (HP-Y).
[1-A4,5] HP-XY: mn(X,Y)=(5.383,6.834) (X/Y)(F1,F2,mean)=(0.651,0.928,0.787), (Norm.: median intensity)
CloneID: 1248228, dbEST3': 2279072, GenBankAcc3': AI463183, UniGene: Mm.13859, plate[5,A,5]
GeneName: Mus musculus ribosomal protein L41 mRNA, complete cds
d) Zdiff data for two separate samples X and Y. Ratio data for the
field F1 and F2 spot data as well as the mnX-mnY Zscore difference is
reported. The three Zscore, ZscoreLog, and logMean normalizations were
used in this example (first lines are shown).
[1-A4,5] HP-XY: mn(X,Y)=(-0.269,0.151) (X-Y)(F1,F2,mean)=(-0.470,-0.370,-0.420), (Norm.: Zscore intensity)
[1-A4,5] HP-XY: mn(X,Y)=(-0.119,0.051) (X-Y)(F1,F2,mean)=(-0.199,-0.142,-0.170), (Norm.: Z-score, stdDev, log intensity)
[1-A4,5] HP-XY: mn(X,Y)=(1.010,1.224) (X-Y)(F1,F2,mean)=(-0.362,-0.064,-0.213), (Norm.: log median intensity)
CloneID: 1248228, dbEST3': 2279072, GenBankAcc3': AI463183, UniGene: Mm.13859, plate[5,A,5]
GeneName: Mus musculus ribosomal protein L41 mRNA, complete cds
e) Example of when the "Use dual HP-X & HP-Y Pseudoimage" mode is
enabled in the "Show Microarray" submenu of the "Plot" menu. This
displays mean data for the HP-X and HP-Y data side-by-side. The median
normalization was selected.
[1-A4,5] intensity[X]=5.3837, intensity[Y]=6.8342, X/Y=0.7877, (Norm.: median intensity)
CloneID: 1248228, dbEST3': 2279072, GenBankAcc3': AI463183, UniGene: Mm.13859, plate[5,A,5]
GeneName: Mus musculus ribosomal protein L41 mRNA, complete cds
Reporting for multiple hybridized samples when using HP-X/-Y 'sets'
If you have enabled MAExplorer to "use HP-X and HP-Y 'sets' of
multiple samples" rather than single samples" in the Samples menu, it
will report a spot differently using the means (mn), standard
deviations (S.D.), coefficient of variations (CV) for the samples in
the HP-X and HP-Y 'sets'. For duplicate fields, these are computed
using the normalized average of F1 and F2 spots for each gene in each
samples. The data used in the following examples is for three C57B6
pregnancy day 13 (HP-X) samples, and five Stat5a (-,-) pregnancy day
13 (HP-Y) samples.
[1-A4,5] HP-X 'set' mean intensity=3.295 stdDev=1.482 CV=0.449 n=3, (Norm.: median intensity)
CloneID: 1248228, dbEST3': 2279072, GenBankAcc3': AI463183, UniGene: Mm.13859, plate[5,A,5]
GeneName: Mus musculus ribosomal protein L41 mRNA, complete cds
g) Multiple HP-XY 'sets' using median normalization for the pseudoarray image
display for the HP-Y 'set' of five Stat5a (-,-) samples.
[1-A4,5] HP-Y 'set' mean intensity=8.180 stdDev=0.986 CV=0.120 n=5, (Norm.: median intensity)
CloneID: 1248228, dbEST3': 2279072, GenBankAcc3': AI463183, UniGene: Mm.13859, plate[5,A,5]
GeneName: Mus musculus ribosomal protein L41 mRNA, complete cds
h) Multiple HP-XY 'sets' using median normalization for the
pseudoarray image display for the HP-X and HP-Y 'sets' when the "Use dual
HP-X & HP-Y Pseudoimage" mode is enabled in the "Show Microarray"
submenu of the "Plot" menu.
[1-A4,5] HP-XY 'sets': mn(X,Y)=(3.295,8.180) mnX/mnY=0.402 SD(X,Y)=(1.482,0.986) CV(X,Y)=(0.449,0.120)\
n(X,Y)=(3,5), (Norm.: median intensity)
CloneID: 1248228, dbEST3': 2279072, GenBankAcc3': AI463183, UniGene: Mm.13859, plate[5,A,5]
GeneName: Mus musculus ribosomal protein L41 mRNA, complete cds
i) Multiple HP-XY 'sets' using median normalization for ratio (HP-X/HP-Y) data
for the "Pseudocolor HP-X/HP-Y Ratio or Zdiff" display.
[1-A4,5] HP-XY 'sets': mn(X,Y)=(3.295,8.180) mnX/mnY=0.402 SD(X,Y)=(1.482,0.986) CV(X,Y)=(0.449,0.120) \
n(X,Y)=(3,5), (Norm.: median intensity)
CloneID: 1248228, dbEST3': 2279072, GenBankAcc3': AI463183, UniGene: Mm.13859, platey[5,A,5]
GeneName: Mus musculus ribosomal protein L41 mRNA, complete cds
[1-A7,20] HP-XY: mn(X,Y)=(3.449,0.853) (X/Y)(F1,F2,mean)=(4.09,4.008,4.041), (Norm.: median intensity)
CloneID: 1382656, dbEST5': 1775754, GenBank 5': AI036495, UniGene: Mm.300, plate[12,A,8]
GeneName: Carbonic anhydrase 3
Reporting for Cy3 and Cy5 channels for a single hybridized sample
k) If you have Cy3/Cy5 data, then you can look at the two channels for
a single sample ( the current HP sample). For median normalization and
the display set to "Pseudocolor Red(Cy5)-Yellow-Green(Cy3) Cy3/Cy5
data" display.
[1-A6,11] Cy5/Cy3=0.3588, Cy5=67.324, Cy3=187.622, (Norm.: median intensity)
CloneID: IMAGE:1054189,
GeneName: expressed sequence AW213287
Reporting HP-X (Cy3 or Cy5) vs HP-Y (Cy3 or Cy5) data for 2 samples
l) If you want to compare Cy3 or Cy5 in the HP-X sample with a Cy3 or Cy5
value in the HP-Y sample, you do it through the special Cy3,Cy5
scatter plots. There are four types of plots:
After the plot is started, clicking on a scatter plot will report data
from the point in that plot will print the following data as shown in
the following example where HP-X Cy3 is plotted against HP-Y Cy3.
[1-A5,16] intensX=4.695, intensY=5.923, (X-Y)=-1.2275, (Norm.: log median intensity)
CloneID: IMAGE:963758,
GeneName: RIKEN cDNA 2410114O14 gene
3.4 Selecting subsets of genes using the data Filter
Genes may be selected on a number of criteria specified by the data Filter (Section 2.4.3)
that is a cascade of data tests. The first might be the gene class (Section 2.4.1) to
restrict the set of all genes to a particular subset. Various numeric
and statistical data tests might be applied on the remaining genes to
exclude those not meeting these tests. For example, genes having a
high coefficient of variation between duplicate spots on the same
sample or duplicate samples could be eliminated. Then one could
select genes that had a (HP-X/HP-Y) ratio greater than 4.0 but less
than 8.0, etc. The latter could be done using either the ratio
scrollers or by clicking on that bin in the ratio histogram plot. See
the Filter menu options and look at one of the tutorials (Appendix A) for ideas on
adjusting the Filter to close in on a particular subset of genes.3.5 Selecting subsets of hybridized sample conditions
Sets and lists of hybridized sample conditions (HP-X and HP-Y sets,
HP-E list) may be selected using various commands from the Samples Menu (Section 2.2)
including pull-down menus, guessing by name or part of a name, or
using a
"Chooser" (see Figure 2.2.1) to design your settings for the
current (HP-X and HP-Y sets, HP-E list). The Chooser is the easiest
way to select these entries. In addition, if you want to change the
current HP-X or HP-Y individual sample, you can do this directly from the array
pseudoarray image by clicking on the [X] or [Y] part of the image
and then selecting the particular sample to use. Note that if the
mouse-over checkbox is enabled, then moving the mouse over the sample
names gives you the full sample name. Otherwise, the sample name may
be truncated.
3.6 Setting threshold values using the state-scroller sliders
You may filter genes using a variety of thresholding operations (see
the Filter menu (Section
2.4.3) to select any of these). For example, these include a spot
intensity (per channel) range [SI1:SI2], gene intensity range [I1:I2],
ratio range [R1:R2], Zdiff range [Z1:Z2], Coefficient Of Variation
(CV) range [0:1.0], p-value range [0:1.0] for the
t-test, etc. Additional threshold scrollers are used with the
clustering methods including the number of clusters (default 6), the
maximum cluster distance from a gene to a another gene for the latter
to be considered in the same cluster, and the absolute difference
between HP-X and HP-Y.
Slider name Associated with operation Spot Intensity SI1 Filter by spot intensity range per channel Spot Intensity SI2 Filter by spot intensity range per channel Percent SI OK Filter by percent of spots whose spot intensity
is in threshold range criteria meets the AT LEAST or AT MOST criteria Intensity I1 Filter by gene intensity range Intensity I2 Filter by gene intensity range Ratio R1 Filter by gene X/Y ratio range Ratio R2 Filter by gene X/Y ratio range Zdiff Z1 Filter by gene X-Y Zdiff range Zdiff Z2 Filter by gene X-Y Zdiff range Ratio CR1 Filter by Cy3/Cy5 gene X/Y ratio range Ratio CR2 Filter by Cy3/Cy5 gene X/Y ratio range Zdiff CZ1 Filter by gene (Cy3-Cy5) X-Y Zdiff range Zdiff CZ2 Filter by gene (Cy3-Cy5) X-Y Zdiff range p-Value Filter by t-Test Spot CV Filter by Coefficient of Variation Cluster Distance Plot - cluster by expression similarity # of Clusters Plot - K-means clustering Diff HP-XY Filter by absolute difference (HP-X,HP-Y) Spot Quality Filter by continuous spot quality (If data available)
3.7 Exporting report and plot data
Data is typically reported in MAExplorer in report and plot
windows. These may be saved using cut and paste if your are using
MAExplorer as an Applet or with "SaveAs" buttons on the popup
windows if you are running it as an stand-alone application . Reports
are then saved as text (.txt extension) files, and plots are saved as
GIF (.gif extension) files.
4. Status and Bugs of MAExplorer
This section discusses the status and known bugs in MAExplorer. It
also discusses dealing with the reporting of fatal errors so we can
resolve them.
4.1 Known Bugs in MAExplorer
Disclaimer: none of our code ever has bugs... :-). So despite
this, we are working on resolving these bugs and implementing planned
functionality. Here is a short non-inclusive list of known problems
that we are resolving. We welcome and encourage you to E-mail us with
any bugs that you find do exist as well as suggestions for
capabilities you would like to see. As the new open-source MAEPlugins facility evolves, most
new (and some old) functionality will migrate to these plugins. Then
the user community can help maintain these analytic methods.
4.1.1 Browser Applet Bugs
4.1.3 Downloading and Installer Bugs
"Recommended version for your computer
Download installer for ...your OS..."?
Occasionally, we have seen instances where you can not install
MAExplorer from within the Web browser. The solution is to explicitly
download the particular Platform for your OS in the Available
Installers list. And then to follow the instructions on running it.
limit stacksize unlimited
4.1.4 Computation speed and display Bugs
4.1.5 User state and login Status
4.1.6 Data file names Bug
4.1.7 Gene Sets Bugs
4.1.8 Clustering Bugs
4.1.9 Expression profile plots
4.1.10 Data conversion problems
4.1.11 Java Plugins bugs
4.2 Revision notes
This section lists the revision history and is useful for deciding
whether to upgrade to the most recent release. You may want to check
for the latest the current
"Stable release" available on the MAExplorer Web site. That may be
different than the Stable
release listed in this copy of the Reference Manual. The "Beta
release" listed below the Stable release in the previous links is
experimental and may generally be downloaded as it has more
functionality. If you experience problems, you can just reinstall the
Stable version.
Note: An archive of some of the
stable older releases is available on the NCI/LECB Web site for a
limited period.
A Cvt2Mae Version 0.66: bug has been fixed that makes it easier
to automatically find the first row of spot data when scanning the
users data input data file. Previously, the user might have to
manually enter that starting data arow number in the Edit Layout
wizard.
Renamed all previous references in the program to "hybridization probe" or
"hybridization sample probe" to the new term "hybridized sample" for
clarity. Changed many "HybProbe Menu" to "Samples Menu" and also many
menu selections as well as plot and report labels to reflect this
change. We are in the process of updating the manual computer screen
figures and PDF slide presentations so that "hybridization sample
probe" is shown as "hybridized sample". Also fixed other minor
problems including fixing the inverted color scale for the
"Pseudocolor Red(Cy5)-Yellow-Green(Cy3) Cy5+Cy3 ratio or Zdiff"
command.
Version 0.94.01:
Major version release.
Moved "Cluster Plots" submenu of the "Plots" submenu
up one level in the "Analysis" menu.
Version 0.93.01: Major version release.
Major version release. Optimized colors in grayscale display for
"Pseudocolor (HP-X,HP-Y) 'sets' p-value". Fixed error and optimized
t-test computation.
Version 0.92.22: Last Stable Release.
This corrects a few minor bugs including crashing when starting up
an empty database (that bug was introduced sometime in the last month).
It is the first release with some reorganized code.
Version 0.91.01: Major version release.
(very Beta) Changing convention so use "MasterID" as the master gene
index. This makes it more flexible than when used the CloneID as the
master gene index. Added LocusLink.
Version 0.90.01: Major version release.
Because MAExplorer can be used with both spotted clone arrays and
oligo arrays,we renamed clones as genes (except where Clone ID is
used) in both the MAExplorer program and in the Reference Manual.
Version 0.89.01: Major version release.
There is not enough memory to cluster current filtered clones. Options:
1. reduce the number of filtered clones and try again, or,
2. disable cluster-cache (Clustering menu) - will be VERY slow.
4.3 Web Browser problems when running MAExplorer as an applet
Because MAExplorer is a large system, and there may be occasional
problems running it in some Web browsers on some operating systems. We
recommend you run MAExplorer as a stand-alone application as it is
more robust.
4.4 Handling fatal error reporting (i.e. DRYROT errors)
If you encounter a fatal error that is detected by MAExplorer, it will
popup an error reporting window. We call this a "DRYROT" error (thanks
to "S.A.I.L." - Stanford AI Lab) because something is wrong in the
program or in the user's data files and from which it can not
recover. This type of error should not have happened. Please save and
e-mail the report to us so we can try to fix the bug or diagnose the
problem. The following figure shows an example of part of a DRYROT
error report.
Release Archive for stand-alone MicroArray Explorer on NCI/LECB
This is an archive of some of the older stable versions of the
MAExplorer stand-alone application program. These are the full
installers which include the Java JDKs for all operating systems as
well as the MGAP database. The user reference manual (available as a
zip file) specific for that version is also included. After a while,
we will remove some of the older releases. To find what the current
and beta releases are, see the Install home page. The
changes between releases are listed in Section 4.2 Revision Notes.
Release
Release Date
Manual (.zip) for Release
0.96.02
07-02-2002
-
0.95.20
05-31-2002
MaeRefMan.zip (10Mb)
0.95.16
05-24-2002
MaeRefMan.zip (10Mb)
0.95.04
03-22-2002
MaeRefMan.zip (10Mb)
Acknowledgements
Primary contributers to MAExplorer were Peter Lemkin
(LECB/NCI), Greg Thornwall (SAIC/FCRDC) in the Laboratory of Experimental and
Computational Biology, NCI/NIH, and Jai Evans (DECA/CIT/NIH).
Kevin Becker and Chris Cheadle (NIA/NIH),
Breast Cancer Think Tank (NCI),
Damien Chaussabel (NIAID),
Terry Clark and Josef Jurrek (U. Chicago),
Mitko Dimitrov (LECB/NCI),
Jai Evans and Chris Santos (DECA/CIT/NIH),
Troy Moore (Research Genetics),
Peter Munson (CIT/NIH),
Alan Li (SourceForge),
Quang Tri Nguyen (LECB+LCRC/NCI),
John Powell and Esther Asaki (CIT/NIH),
Eric Shen (U. Arizona),
Moshe Shani (Agr. U. Israel),
Richard Simon (NCI/NIH),
Bob Stephens and Gary Smithers (ABCC/FCRDC),
Ron Taylor (U. Colorado),
Mark Vawter (NIDA/NIH, UC-Irvine),
John Weinstein (LMP/NCI), David Kane (SRA/NCI), Ajay (LMP/NCI),
and to many others for useful discussions and suggestions that have
helped improve the MAExplorer's capabilities and usability.
References to related exploratory data analysis methods and MAExplorer
This short list of references is limited to a few related to
exploratory data analysis methods for microarrays as they relate to
MAExplorer. It is not meant to be inclusive. More extensive
lists of references to many of the array preparation and data mining
methods can be found in some of these papers and on the Internet.
Appendix A. Short tutorial for MAExplorer
This tutorial is for use with MAExplorer, an exploratory data analysis
facility for microarray DNA databases. It may be used with any
MAExplorer database. As with all tutorials, they are only starting
points for getting you started - in this case into understanding the
data mining analysis environment. Try out new options on your own,
you can't break anything :-).
NOTE: THIS APPENDIX IS BEING REVISED AND EXPANDED... A.1 Demonstration data
Note that the downloadable MAExplorer stand-alone application includes
a subset of 50 hybridized samples from the MGAP database including a
number of startup files for that data (see the the list of startup .mae
files included in the download
installation).
A.2 General instructions:
Throughout this tutorial we refer to condition X and condition Y.
These are different hybridized samples in the particular database you
have loaded. For example, in the MGAP database X might be lactation
and Y might be pregnancy. X and Y 'sets' are multiple samples of these
two conditions.
If the particular samples you want to analyze are not listed in that
example, after it starts you will be able to add samples you do want
and remove samples you don't want - regardless of which example was
intially used if the database "Samples" database contains additional
hybridized samples.
A.3 Self-guided tutorial of MAExplorer - notation and examples
The following is a self-guided tutorial
(you issue the commands) that illustrates some of the data
analysis capabilities. In the following examples, the notation "go
to A:B:C" means go pull-down menu A, then submenu B and, then make
selection C. "Selecting a gene" from the microarray image or scatter
plot means clicking on a spot in the pseudoarray image or a point in
the any of the plots.
A.3.1 Review of types of gene data available in the database
step 1: go to Analysis: GeneClass: All genes
the array shows all genes with white circles.
step 2: go to Analysis: GeneClass: All named genes
the array shows named genes with white circles.
step 3: go to Analysis: GeneClass: ESTs similar to genes
the array shows ESTs similar to named genes with white circles.
step 4: go to Analysis: GeneClass: ESTs
the array shows unknown ESTs with white circles.
step 5: go to Analysis: GeneClass: All genes and ESTs
the array shows all named genes and all ESTs with white circles.
step 6: go to Analysis: GeneClass: Replicate genes
the array shows replicate genes having at least 2 copies in the
array with white circles.
step 7: go to Analysis: GeneClass: Calibration DNA
the array shows calibration DNA (if present) with white circles.
step 8: go to Analysis: GeneClass: Your plates
the array shows clones from user's plates (if present) with white circles.
A.3.1.1 Analysis of the expression of a single known gene
ratio between two conditions X and Y (HP-X, HP-Y)
expression profile of a set of conditions (HP-E) (see
Example A.3.1.7)
step 1: click on the blue "Enter gene name" button to pop up a name
entry window
step 2: start typing gene name into blue text entry window
step 3: once gene names appear, click on gene of choice
step 4: press "Done" button in pop up window
A yellow circle will define the gene as the "current gene" in the microarray
pseudoarray image (info on gene is also provided in the status area above the array).
If there are replicate grids (left and right fields of repeated genes are denoted
by F1 and F2) in the array (HP). The mean(HP-X,HP-Y) values and the (HP-X/HP-Y)
values for the specified gene are reported are reported.
step 5: alternatively, click on an array spot of choice to define any gene
in the array as the new current gene
A.3.1.2 Find a subset of genes with a common substring (e.g. *ONCO*)
step 1: click on the blue "Enter gene name" button to pop up a name
entry window
step 2: start typing "*ONCO*" (without the quotes) into blue
text entry window
step 3: once gene names appear, press "Set E.G.L." button in pop up
window
Magenta squares will indicate these genes in the pseudoarray image.
These include the 'onco'genes and the proto-'onco'genes
A.3.1.3 Two conditions - scatter plots:
Create a scatter plot of two hybridized samples where condition X data
is on the X axis and condition Y data on the Y axis.
then click on yellow circle in scatter plot to get HP-X/HP-Y
ratio for the gene
step 2: click on any point in the scatter plot
this also alternatively defines any gene in the plot as the new
current gene
step 3: zoom in on a region of the plot using the vertical or
horizontal scroll bars
step 4: click on another point in the scatter plot to get the
HP-X/HP-Y ratio another gene
step 5: press "Close" button to remove pop up window
A.3.1.4 Scatter plot of Cy3 vs Cy5 or replicate spots (F1 vs F2) of one sample
Create a scatter plot of Cy3 vs Cy5 channels or replicate spot F1, F2
data if your database is contains (Cy3,Cy5) ratio data or it contains
replicate spot fields (F1,F2).
or go to Analysis: Plot: Scatter plots: F1 vs. F2
Then, click on green circle in scatter plot to get Cy3/CY5
ratio for the gene
or F1/F2 ratio for replicate spots for that gene
step 2: click on any point in the scatter plot
this also alternatively defines any gene in the plot as the new
current gene
step 3: zoom in on a region of the plot using the vertical or
horizontal scroll bars
step 4: click on another point in the scatter plot to get the
HP-X/HP-Y ratio another gene
step 6': select the samples you wish to swap and press "Done". This
enables you to see the swapped results in the scatter plot
step 7: press "Close" button to remove pop up window
A.3.1.5 Filter by expression ratio between two conditions X and Y
step 1: go to Analysis: Plot: Histograms: HP-X/HP-Y
the histogram shows the ratios
step 2: move pop up plot so you can see it and the array
simultaneously
step 3: choose (click on) a ratio bin
genes filtered by the ratio range of the bin will light up on the array ('+'s)
step 4: click on different bin in the histogram to select
another bin
step 5: click on word "Freq" on left in histogram to remove the
histogram bin filter
You can filter out low intensity genes by
A.3.1.6 Filter by spot intensity range
step 1: go to Analysis: Filter: Filter by spot intensity [SI1:SI2]
sliders: Use spot intensity [SI1:SI2] sliders
step 2: adjust intensity lower bound (SI1) to remove low ratio
genes
step 3: when done, remove the 'Filter by intensity sliders' by
toggling it off (redo step 1 to toggle it off)
step 4: repeat steps 1-3, but this time use Filter : Filter
by [I1:I2] sliders :
Use spot intensity (or Cy3/Cy5) [I1:I2] sliders
A.3.1.7 Multiple conditions - expression profile plots of HP-E data:
step 1: go to Analysis: Plot: Expression profile: Display a gene's
expression profile
step 2:
after the expression profile window pops up, click on a
gene in array to see its profile
step 3: click on a line in the profile plot to see its intensity
step 4: click on a different gene in the array to see its
profile
step 5: press "Show HPs" button to see the list of samples used
step 6: press "Close" button to remove pop up windows
A.3.2 Changing the normalization between hybridized samples
You may change the normalization method used to scale data between
hybridized samples so they may be compared.
A.3.2.1 Set normalization
step 1: go to Analysis: Normalization: Median intensity
step 2: go to Analysis: Plot: Scatter plots: HP-X vs. HP-Y
to see the effect of normalization on the scatter plot.
Note how outliers appear.
step 3: go to Analysis: Normalization: Zscore of intensity
step 4: go to Analysis: Normalization: Zscore of log intensity, stdDev
step 5: go to Analysis: Normalization: Unnormalized
this does not scale data between samples.
step 6: go to Analysis: Normalization: Median intensity
this leaves the normalization method in Median mode.
A.3.3 Analysis of the expression profiles of gene classes
You may restrict the set of genes by Gene Class. Several built
in gene classes are defined. You may also set up additional ones and
filter by those (not covered in this short tutorial).
A.3.3.1 Filter by gene class membership
step 1: go to Analysis: GeneClass: All known genes
the array only shows named genes (additional gene subclasses
are being added)
step 2: go to Analysis: Plot: Scatter plots: HP-X vs. HP-Y
to see the two condition expression of just these genes
step 3: go to Analysis: Plot: Expression profiles: Display Filtered
genes expression profiles
to see the multiple condition expression of just these genes. This may take a
while if there are many genes
step 4: you can click on a line in any of the plots to see the
samples' intensity value for that gene
step 5: when done, press "Close" button in all pop up plot windows
A.3.3.2 Gene Reports
step 1: go to Analysis: Report: Gene reports: Filtered genes: Genes passing Filter
Clicking on a blue entry will bring up I.M.A.G.E, dbEST, UniGene, or GenBank,
LocusLink, or mAdb Clone database in pop up Web page
step 2: press "Close" button in report, and close this pop up Web page
step 3: go to Analysis: Report: Table format: Tab-delimited
to enable creating Excel-compatible reports
A.3.3.3 Exporting Gene Reports to Excel
step 1: repeat step 1 of the Gene Report, but this time
to make text-formated report
step 2: cut the text from this window and paste it into an Excel window.
This is useful for exporting data if you are on a Windows PC
step 3: go to Analysis: Filter: all genes
to restore it to all of the genes from all named genes
step 4: go to Analysis: Report: Table format: Spreadsheet
step 5: press "Close" button in report
A.3.4 Analysis of the expression profile of multiple hand picked genes
Users can manually define a set of genes which are kept in the
Edited Gene List (E.G.L.). Various operations can then use the
EGL to restrict the set of data being analyzed.
A.3.4.1 Define a list of edited genes, then plot all their expression
profiles at one time
step 1: go to View: Show 'Edited Gene List'
this turns on the
'Edited Gene List' magenta square box overlays
step 2: hold CONTROL key and click on genes in array to add a gene
step 2': hold SHIFT key and click on genes in array to delete a gene.
This lets you edit a list of genes. It also works when clicking in a scatter plot
step 3: go to Analysis: Plot: Scatter plots: HP-X vs. HP-Y
to see the Edited Gene List in the scatter plot
step 4: try defining (or removing) E.G.L. genes in the scatter
plot by holding the
CONTROL (or SHIFT) key when clicking on points in the
scatter plot
A.3.4.2 Filtering by edited gene list
step 1: go to Analysis: Filter: Filter by 'Edited Gene List'
step 2: go to Analysis: Plot: Expression profiles: Display Filtered genes expression profiles
scroll through the plots to see all of the profiles
step 3: go to Analysis: Filter: Filter by 'edited gene list'
this turns off the 'edited gene list' filter
step 4: press "Close" button in expression profiles window
A.3.4.3 Report of edited gene list
step 1: go to Analysis: Report: Gene report: genes in 'edited gene list'
reports edited genes
step 2: press "Close" button in report
step 3: go to Analysis: Filter: Filter by 'edited gene list'
this turns off the 'edited gene list' filter
step 4: go to View: Show 'edited gene list'
this turns off the 'edited gene list' squares overlay
A.3.5 Identify a cluster of genes with similar expression
profile to the current selected gene
step 1: go to GeneClasses: All named genes and ESTs
step 2: go to Analysis: Plot: Cluster plots: Cluster genes with expression profiles similar to current gene
this will pop up a cluster summary and cluster distance slider control window.
Move the summary and slider windows so you can see all 3 windows. The size of
the cyan boxes on similar genes in the pseudoarray is proportional to the similarity.
Adjust the cluster distance slider to smaller values and note how
the number of genes clustered decreases.
It should be set for a reasonable number considering the material you
are analyzing.
step 3: select (click on) a new current gene
the genes which belong to that cluster are labeled in the array with cyan boxes
and are defined as the "current cluster". The current gene you click on has
a green circle around it
step 4: press "Cluster Report" button in the cluster summary
this pops up a Gene Report for the clustered genes
step 5: press "Close" button in the report
step 6: press "EP plot" button in the cluster summary
this pops up a scrollable list of expression profile plots sorted by similarity
to the current selected gene.
step 7: press "Close" button in the report
step 8: press "Close" button in the cluster summary
A.3.6 Identify clusters of genes with similar expression under
various conditions using data mining filters
step 1: go to GeneClasses: ESTs similar to genes
step 2: go to Analysis: Plot: Cluster plots: K-means clustering of
gene expression profiles
this will pop up a cluster summary and slider control window. Move the
summary
and slider windows so you can see all 3 windows. The size of the
magenta circles in the array is proportional to # genes/cluster
step 3: select (click on) a new current gene
the genes which belong to that cluster are labeled in the array with tiny green
numbers are defined as the "current cluster". The current gene you click
on has a green circle around it
step 4: go to View: Show 'edited gene list'
genes in the current cluster were also copied to the edited gene
list
step 5: go to Analysis: Report: Gene report: genes in 'edited gene list'
reports genes in the current cluster
step 6: press "Close" button in report
step 7: go to View: Show 'edited gene list'
this turns off the 'edited gene list' squares overlay
A.3.6.1 Varying the number of clusters
step 1: vary the "# of clusters" slider value from 6 to 10, then 20
note the number of clusters changes and the gene cluster composition also
changes
A.3.6.2 Defining a new cluster "seed" to recluster the genes
step 1: select a new current gene in array and press the
"Recompute clusters" button
this recomputes the clusters using the current gene as the new seed gene
A.3.6.3 Cluster expression profile plots
step 1: press "EP plot" button and scroll down the list after they appear
the primary nodes for each cluster are indicated with red labels
in the set of
profiles, and the other genes are labeled with their cluster number
step 2: press "Mean EP plot" button and scroll down the list after they appear
these are the mean expression plots of the primary nodes clusters.
A.3.6.4 Report of all clusters
step 1: press the "Cluster-Report" button to get a sorted cluster
list scroll the spreadsheet to the right to see the cluster statistics
step 2: press the "Mn-Cluster-Report" button to get a sorted cluster list
scroll the spreadsheet to the right to see the mean expression profiles
step 3: press "Close" button in pop up windows
A.3.6.5 Current cluster in scatter plot
step 1: go to Analysis: Plot: Scatter plots: HP-X vs. HP-Y
step 2: move the plot so you can see both scatter plot and array
step 3: click on a gene in the cluster or on spots in the scatter plot
note that the green cluster numbers are drawn in the scatter plot
step 4: go to Edit: Sets of genes : Save 'Edited gene list' as gene sets
this will pop up a dialog box requesting "Enter new gene set name"
step 5: type "Genes in current cluster class"
this will save the current cluster in a gene set.
This gene set will
be used in the next example
step 6: press "Close" button in pop up windows
step 7: (optionally) investigate hierarchical cluster with clustergrams and
dendrograms by going to Plot : Cluster Plots : Hierarchical clustering plot for HP-E
A.3.7 User Gene Set operations
You may manipulate sets of genes. Some of these are predefined for you
by the database (eg. All named genes, ESTs, etc.). Others are defined
by particular operations (E.G.L., clustering, etc.), and lastly others
may be defined by you using logical operations on these sets (OR, AND,
DIFFERENCE).
A.3.7.1 List of the current gene sets
step 1: go to Edit: Sets of genes : List saved gene sets
this lists the current list of gene sets
step 2: Change the E.G.L. set
of genes and note how the # of E.G.L. genes changes in the list.
You can add (remove) genes to the E.G.L. by clicking on a spot in the array while the
CONTROL (SHIFT) key is held down.
A.3.7.2 Filter by user defined gene set
step 1: go to Edit: Sets of genes : Set 'User Filter Gene Set' (for Filter)
this will request a gene set to use with the Filter in a pop up dialog box.
Enter gene set # for the set for "Genes in current cluster class" which you saved
in the previous example.
then press "Ok" in the dialog box.
step 2: go to Analysis: GeneClass: All genes and ESTs
this resets the filter to look at all genes and ESTs
step 3: go to Analysis: Filter: Filter by 'User Gene Set' membership
this restricts the genes to the saved current cluster in the previous
example
A.3.7.3 Gene set operations
step 1: go to Edit: Sets of genes : OR (Union) of 2 gene sets
this will request 3 gene set names in a pop up dialog box.
Enter set # for (All known genes) for the 1st gene set name,
Enter set # for (Genes in current cluster class) for the 2nd gene set name,
Enter "Union of known genes and genes in current cluster" for new gene set name.
then press "Ok" in the dialog box.
this computes the union of the two gene sets into a new gene set
step 2: go to Edit: Sets of genes : Set 'User Filter Gene Set'
this will reset the 'User Filter Gene Set' for the Filter in a pop up dialog box.
Enter the set number or the beginning of the set name 'Union' that is the
set for "Union of known genes and genes in current cluster" just saved.
step 3: try saving other Filtered genes sets and doing other
gene set operations.
A.4 Additional tutorials
If you wish to investigate MAExplorer in more detail, try some of the
suggested examples in the advanced
tutorial (Appendix B) in the reference manual.
Appendix B. Advanced tutorial for MAExplorer
There are a number of things you may do in this facility. We wrote
this advanced tutorial to help demonstrate some of its capabilities. A
short tutorial (Appendix A)
is also available and we recommend doing it before attempting the
advanced tutorial. Sources of startup data to use with the tutorials
are listed in the short tutorial. As with all tutorials, they are only
starting points for getting you into the analysis environment - try
out new options on your own, you can't break anything :-).
NOTE: THIS APPENDIX IS BEING REVISED... Here are some things to try
When first started, it loads some initial data it needs as well as the
particular hybridized samples you specified. After MAExplorer starts,
it displays "Ready - click on a gene to query
database" and the menus becomes active. Here are some things to
try.
Appendix C. Use of MAExplorer with user's microarray data
This section discusses the use of MAExplorer to convert microarray
data from a variety of sources including various types of labeling
33P-labeled, biotin-labeled, or Cy3/Cy5 ratio-labeled
spotted membranes or glass slides or oligo-chips of different
geometries and numbers of duplicate spots/gene.
Note: This appendix contains a "computerese" description on how
to use MAExplorer with your array data. The user-friendly "wizard"
tool
Directory (i.e. folder) structure of stand-alone databases
When running as a stand-alone application, MAExplorer assumes that
data from a local computer has a specific directory structure. The
required and optional directories (also called "folders" on some
operating systems) and files they contain are diagramed here from a
database project directory in your file system. The notation
"/folder-name" indicates that "folder-name" is a folder inside of the
project.
(specific database directories and files they contain)
/ Cache
/ (copies of any data files saved from Web DB access)
/ Config
/ MaExplorerConfig.txt
/ SamplesDB.txt
/ GIPO-db.txt
/ MAE
/ (set of startup database files).mae
/ Images
/ (set of original or sampled array .jpg images) (optional)
/ Plugins
/ (optional set of .jar or .class MAEPlugin files)
/ Quant
/ (set of spot quantified data files).quant
/ Report
/ (set of .txt and .gif report files generated using SaveAs
/ State
/ (set of gene set files).cbs and
/ (set of condition list files).hbl generated using Save DB
Examples of some of the database files required by MAExplorer
These could be used as examples that could be used in creating your
own database files. When the MAExplorer converter tool, Cvt2Mae, is
released it will eliminate the need for manually editing your database
files.
In addition, examples of the (Config/, Quant/ and MAE/) files
needed for various types of arrays are available at:
Additional directories used at run-time
When running MAExplorer as a stand-alone application, you may save
data on the disk Text reports and plot graphics windows are saved as
".txt" text and ".gif" image files when the user uses the "SaveAs"
button in the respective popup windows. These files are saved in the
"Report" subdirectory.Tools for automating the construction a local stand-alone database
Software tools for aiding the construction of local stand-alone
databases from vendor supplied GIPOs and spot quantification files are
not available at this time, but will be made available in the future.
Manually constructing a local stand-alone database
Although the Cvt2Mae converter tool can convert many files, you could
alternatively build these files manually. We suggest using Excel or
your favorite RDBMS system to manipulate the data. At the end, save
the data into files with tab-delimited fields with the above file
extensions (i.e. .txt, .quant, .mae). The layout of these files and
what is optional and what is not is described in detail (maybe too
much!) below. You could use an ASCII file text editor instead of Excel
(such as Wordpad, Emacs, etc.) - but be careful not to add or
delete tabs since this will destroy the integrity of the database
tables. Be consistent in your file names; avoid spaces; use ASCII
characters in file names that are system independent (i.e. A-Z,
a-z,0-9, "-", "+", "_"); Use either "-" or "_" or both. C.1 Creating quantified spot data files from hybridized sample arrays
Quantified spot data from images scanned from hybridized sample arrays
may be created using a variety of software programs. Discussion of
these is beyond the scope of this manual. However, several of these
including Pathways 2.01, ImageQuant-NT, and others generate
tab-delimited text files. These files may be used directly as the
quantified spot files required by MAExplorer, or simplified first (by
removing unused or redundant data fields). Typically, the files are
named (or renamed) to that of the sample to distinguish them from each
other and a .quant file extension assigned instead of the
.txt file extension. Other programs generate tab-delimited
files that could be mapped to our .quant file formats. (For example,
the NCI/CIT mAdb system
generates such a mapping for GenePix(TM), and ScanArray
formated data.)
C.1.1 Color and prefix notation for the following tables:
(req), (opt), (future)
The following tables list parameters and some typical values
that might be included in the configuration and quantification files.
These examples illustrate the variety of parameters or fields with
examples of values that might be used. Required parameters are
in black with "(req)" prefix. Optional parameters are
indicated in blue with a "(opt)"
prefix. Optional parameters are not normally specified and are
generated in the .mae file when you save the state of a data
exploration. Parameters that might be used with Cy3/Cy5 ratio data
are indicated in magenta with a "(ratio)"
prefix. Future options not currently used are indicated in
green with a "(future)" prefix. Alternative
options are indicated in red with an "(alt)"
prefix.
C.2 Table of samples that can be loaded into MAExplorer
The samples available to be analyzed in a database are listed in a
samples database table. This lists all samples that
could be loaded. The user will then select a subset of these to
be analyzed. The selection is done either in preset Web startup pages,
or with the stand-alone application .mae startup files, or at
run-time by selecting new entries from the Samples pull-down
menus. Extra information may be provided to MAExplorer for each sample
through this table and will be available for the Sample Array report in
Section 2.4.6.1.
Sample_ID Project Database_File
control 1 breastCancer control1
control 2 breastCancer control2
control 3 breastCancer control3
tumor 1 breastCancer tumor1
tumor 2 breastCancer tumor2
tumor 3 breastCancer tumor3
Sample_ID Project Database_File Database_ID
control 1 breastCancer control1 270314
control 2 breastCancer control2 270315
control 3 breastCancer control3 270316
tumor 1 breastCancer tumor1 270317
tumor 2 breastCancer tumor2 270318
tumor 3 breastCancer tumor3 270319
Field
Description
(req) Sample_ID
descriptive name of the sample, free text.
[Note: an older depricated name is "Membrane_ID"]
(req) Project
that the sample belongs. Used for login protection
and grouping of samples
(req) Database_File
name of the .quant spot database file,
no spaces. This is the file name for the sample.
(opt) DatabaseFileID
database file ID corresponding to Database_File
and Sample_ID. For use with RDBMS Web databases (e.g. experiment id #).
NOTE: if you are encoding auxillary data files using this identifier,
e.g. sampled array images in the Images/ directory, then this field
is required if you want to access those images.
(opt) Menu_Source_Name
Sample SubMenu j that this sample belongs.
You could use the word "Default" or leave out this entry if you do not
want to use sub menus.
(opt)Orig_File_Name
if applicable. The original file name and sample name
if the data was split out from a multiple hybridized sample file.
(opt)Strain
if applicable
(opt) Source
if applicable
(opt) Probe
if applicable
(opt) Stage
if applicable (eg, developmental stage, dose,
time point, etc)
(opt) Login
(optional) TRUE if login required with a Web server
else blank. This is used primarily with the Applet when interacting
with a Web server
(opt) GeneCard_URL
GeneCard ID if applicable
(opt) Histology_URL
(e.g. MGAP) histology DB Web page if applicable
(opt) Model_URL
(e.g. MGAP) mouse model database Web page if applicable
(opt) BGLow
global low value of array background intensity
(opt) BGAvg
global average value of array background intensity
(opt) BGRms
global root-mean-square value of array background intensity
(opt) Contributor
name of researcher submitting the sample
(opt) Contrib_Institute
researcher's organization
(opt) Submission_Date
when submitted
(opt) Exposure
minutes or hours of radiolabel or fluorescent
exposure
(opt) Sample_Nbr
internal sample number
(opt) FilterType
name of the array layout
(opt) FilterType_Description
additional description of array layout
(opt) Comments
details describing sample
(opt) Researcher
researcher performing the hybridization
(opt) SampleGrid
serial number of the array or grid or internal
laboratory numbering. (Useful if reusing arrays etc)
C.3 Quantified spot data file formats
MAExplorer has been designed to be able to read quantified spot data
from a variety of spot analysis software packages. So the data file
format is very flexible. Essentially, a data file contains one or
more spot intensity values per gene in each row of the data file. A
spot location is specified by a GIPO (field#, grid#, grid column#,
grid row#) 4-tuple with the field value optional. Note: a "grid" is
sometimes called a "block" or a "patch". If the field specification is
omitted and there are duplicate spots in multiple fields of grids,
then it is defined implicitly. In that case, the corresponding spot
intensity data for each field for a gene is specified as separate
columns going from left to right. The (grid#, column#, row#) part of
the specification may be encoded several ways: a) explicitly as
(grid#, column#, row#) or b) NAME_GRC.
grid grid col grid row RawIntensity Background
1 1 1 2226.8 32.6
1 1 2 1234.8 25.6
. . .
10 25 28 3333.8 23.6
grid grid col grid row RawIntensity1 Background1 RawIntensity2 Background2
1 1 1 2226.8 32.6 2345.9 39.4
1 1 2 1234.8 25.6 1245.9 39.4
. . .
10 25 28 3333.8 23.6 3345.9 25.4
field grid grid col grid row RawIntensity Background
1 1 1 1 2226.8 32.6
1 1 1 2 1234.8 25.6
. . .
1 10 25 28 3333.8 23.6
. . .
2 1 1 1 2226.8 39.4
2 1 1 2 1234.8 39.4
. . .
2 10 25 28 3333.8 25.4
NAME_GRC RawIntensity1 RawIntensity2
GRID- 1-R1C1 2126.500 3662.350
GRID- 1-R2C1 2311.430 3306.290
GRID- 1-R3C1 3696.470 5780.310
GRID- 1-R4C1 3167.450 5245.440
. . .
grid grid col grid row Cy3 Cy3Bkgd Cy5 Cy5Bkgd
1 1 1 2226.8 32.6 2345.9 39.4
1 1 2 1234.8 25.6 1245.9 39.4
. . .
10 25 28 3333.8 23.6 3345.9 25.4
Field
Description
(opt) field
field for duplicate genes if using single
'RawIntensity' value/Row
(req) grid
grid name (either A,B,C,... or 1,2,3,... )
(req) grid col
column with in a grid
(req) grid row
row within a grid
(opt+alt) NAME_GRC
(alternative specification of "grid, grid col, grid row").
(req) RawIntensity1
intensity value for field 1. Use this form if there
is more than 1 intensity value/row.
(req) RawIntensity2
intensity value for field 2
(required if it exists and for Cy3, Cy5 data)
(req+alt) RawIntensity
intensity value for field 1,
if only one field used
(opt) Background1
background intensity value for field 1
(opt) Background2
background intensity value for field 2
(if it exists for F1,F2 data or Cy3, Cy5 data)
(opt+alt) Background
background intensity value for field 1,
if only one field used
(opt) QualCheck
quality check for data
indicating "bad" spots or genes. Current codes are
listed in the Table C.4.2 of QualCheck semantics
(opt) DetValue
spot data detection value quality. This could
be the Affymetrix MAS5.0 "Detection p-value" or some other metric
correlated with spot detection quality in the range of [0.0 : 1.0].
metrix
Field
Description
(req) Cy3RI
RawIntensity1 value for Cy3
(req) Cy5RI
RawIntensity2 value for Cy5
(opt) Cy3Bkgrd
Background1 value for Cy3
(opt) Cy5Bkgrd
Background2 value for Cy5
(opt) Cy3
RawIntensity1 value for Cy3
(opt) Cy5
RawIntensity2 value for Cy5
C.4 The GIPO table database file format
The gene-in-plate-order (GIPO) table used to make the connection
between a spot on a microarray and the plate well corresponding to a
gene. We are working on extending the format so that it will more
easily handle GIPO tables from a variety of sources.
Location grid grid col grid row plate plate row plate col Clone ID GenBankAcc GeneName
. . .
39 A 2 15 2 1 3 1247601 AA763423 "Mus musculus A kinase anchor protein (AKAP-KL) mRNA, alternatively spliced isoform 1, complete cds"
40 A 2 16 2 1 4 1247553 AA763380 Mus musculus bodenin gene
41 A 2 17 2 1 5 1247865 AI465019 "Mouse beta-D-galactosidase fusion protein mRNA, complete cds"
. . .
Table C.4 List of GIPO data file table fields.
These fields define the mapping between a spot's grid coordinates on
the array and its genomic identifier, gene name, its plate, etc.
Field
Description
(opt) field
array field for duplicate genes
grid
array grid name (either A,B,C,... or 1,2,3,... )
grid col
array column within a grid (either A,B,C,... or 1,2,3,... )
grid row
array row within a grid (either A,B,C,... or 1,2,3,... )
(opt+alt) NAME_GRC
alternative specification to "grid, grid col, grid row".
It is generated by the Molecular Dynamics spot quantification software.
(opt) Master Gene ID
This is the master gene identifier
used in MAExplorer. It must be one or more of the identifiers listed
in Table C.4.3. One of
these will be selected as the Master Gene ID (MID)
(req) Gene Name
Master Gene Name.
The GeneName options are listed in Table C.4.1. These alternative
GeneClasses are automatically
recognized from the Gene Name.
(opt) plate
plate name for original gene. If this
is not specified, it uses the grid value.
(opt) plate row
plate row name for original gene. If this
is not specified, it uses the grid row value.
(opt) plate col
plate column name for original gene. If this
is not specified, it uses the grid col value.
(opt) QualCheck
quality check for data indicating "bad" spots or
genes. Current codes are listed in the Table C.4.2
below
Table C.4.1 List of possible Master Gene Name
The Master Gene Name must be define as one the following identifiers:
Field
Description
(opt) GeneName
Gene name
(opt) Unigene cluster Name
alternative for GeneName if the latter is
not specified.
Automatic Gene Class naming based on Gene Name
Some Gene Classes are automatically recognized from
the Gene Name including:
Alternative Grid,Row,Column encoding scheme: NAME_GRC
Some quantification programs (e.g. Molecular Dynamics "ImageQuant-NT) specify
"grid, grid_col, grid_row" by a single symbol we denote NAME_GRC coded
as follows
GRID- grid#-Rrow#Ccol#
GRID- 8-R12C11
Table C.4.2 List of QualCheck codes and their semantics
The data filter "Filter by 'Good Spot data'" may be used in
eliminating bad spot data on a per-gene set basis. This uses the
"QualCheck" field in the quantified data table is present. It maps
either an 1) integer numeric code (see Appendix C of the Reference
Manual), 2) an alphabetic code (e.g. Affymetrix "Abs Call") of "P" (or
"G" or "T") to Good Spot, "A" (or "B" or "F") to Bad Spot, and "M" to
Marginal Spot, or 3) a continuous quality value. In this latter case,
QualCheck may be a continuous monotonically increasing floating point
value (e.g. 0.0 to 100.0, or 0.0 to 1.0, -100.0 to +100.0, etc.) in which case a "Spot
Quality" State threshold slider will popup when the filter is invoked.
Additional property value codes may be added in the future.
Status
QualCheck value
Semantics
Good gene
2
the spot data is "Good" (some systems
report this by a NULL quality measure). It has a good gene name.
Alternatively, letter codes may be used "P", "G", "T".
Bad gene
4
the spot data is bad, a good gene name.
Bad spot
8
is a non-analyzable spot (eg. marker, or "Bad", "Not Found",
"Empty". etc.) Alternatively, letter codes may be used "A", "B", "F".
Duplicate spot
16
is duplicate of another gene on array
Marginal spot
256
is a marginally quantified spot.
Alternatively, letter codes may be used "M".
Table C.4.3 List of possible Master Gene Identifiers
Additional data is used to point to data in external genomic databases
by specifying the identifier. This may be used to dynamically link
genes in the MAExplorer database to Web database servers to bring up
Web pages from these databases. Note the Master ID needs to be
specified and may be any one of the following identifiers. The
appropriate genomic Web browser access will be enabled depending on
the genomic Master ID specified.
(See Section C.1.1 for option notation.)
The fields include:
Field
Description
(opt) Location
alternate spot identifier. E.g., Affymetrix 'probe_set', or
Incyte 'IncyteID', etc. This may be numeric or alphanumeric
(opt) Clone ID
I.M.A.G.E. consortium database clone ID. It may have a
"IMAGE:" or "ATCC:" prefix
(opt) Unigene cluster ID
NCBI UniGene database ID
(opt) dbEST3'
NCBI dbEST database
(opt) dbEST5'
NCBI dbEST database
(opt) GenBankId
NCBI GenBank database
(opt) GenBankId3'
NCBI GenBank database
(opt) GenBankId5'
NCBI GenBank database
(opt) RefSeqID
NCBI RefSeq database
(opt) LocusID
NCBI LocusLink database
(opt) OMIMID
NCBI OMIM database
(opt) SwissProtID
Swiss-Prot database
Table C.4.4 Extending Genomic IDs and associated URLs
MAExplorer allows you to define your own gene identifiers that will
map to external genomic databases. You add the following entires in
sets of 4 to the Configuration database or to the .mae startup file.
These entries will be added to the View menu where you may select the
external genomic database to visit when you activate MAExplorer to
launch a brower on clicking on a gene. The following table shows the 4
required fields for 2 entries. There may be any number of external
genomic IDs.
(See Section C.1.1 for option notation.)
Parameter
Value
DataType
Comments
(opt) GenomicMenu1
GenBank
String
Name of the database. This will appear in the View menu
(opt) GenomicURL1
http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?db=2&form=1&term=
String
URL to which one adds the 'GenomicIDreq' value
(opt) GenomicURLepilogue1
String
epilogue of the URL if any
(opt) GenomicIDreq1
GBID
String
Name of the GenomicID required and that is specified
in the GIPO file as one of its fields
(opt) GenomicMenu2
UniGene
String
Name of the database. This will appear in the View menu
(opt) GenomicURL2
http://www.ncbi.nlm.nih.gov/UniGene/query.cgi?ORG=Mm&CID=
String
URL to which one adds the 'GenomicIDreq' value
(opt) GenomicURLepilogue2
String
epilogue of the URL if any
(opt) GenomicIDreq2
UID
String
Name of the GenomicID required and that is specified
in the GIPO file as one of its fields
C.5 Configuring MAExplorer for use with various types of array data
MAExplorer has been designed so that it may be reconfigured for
different array dependencies including: geometries, number of replicate
fields, scanner dynamic ranges, labeling, etc. When first started,
MAExplorer reads this configuration file and then uses this
information to handle reading different types of array data files that
are subsequently loaded. To make it easier to understand, the entire
table is presented as several sub-tables - however, MAExplorer reads
it as a single table (the default being called
MaExplorerConfig.txt). Note that optional parameters are
for the most part - optional. Many of these may be set from the
MAExplorer menus once the program is started. The reconfigured
state may then be saved (File | Database | SaveAs ... DB) with
these and other state values retained for the next time the particular
startup database is used.
Parameter subset
Function of these parameters 1. Array content & geometry
Describes the content and geometry of the arrays (required) 2. Threshold defaults
Describes the threshold defaults (optional) 3. Array database files
Describes the array specific database files (required) 4. Table field mapping
Describes "mapTF" table,field mapping. This maps user defined names
to names required by MAExplorer and is only required if the user names
are different from the names MAExplorer expects.5. URL genomic databases
Describes base addresses of genomic Web DBs (optional).
If you do not specify these, default values are supplied from
the program.6. User menus
Describes user-specific menus (optional)
Parameter Value DataType Comments
MAX_FIELDS 1 int # replicate grids/array
MAX_GRIDS 2 int # grids/field
MAX_GRID_COLS 38 int # columns/grid
MAX_GRID_ROWS 27 int # rows/grid
usePseudoXYcoords true boolean use pseudoarray XY coord image - no XY data
gipoFile GIPO.txt File name of GIPO file from
samplesDBfile SamplesDB.txt File name of Samples DB file
dataBase demo String default name of project database
dbSubset demo1 String default database subset name
useRatioData true boolean treat duplicate(F1,F2) data as ratio (F1/F2) - i.e.Cy3/Cy5
EditDate Tue Aug 21 2000 String demo
Table C.5.1 List of array database-specific content and
geometry configuration (Parameter,Value) entries
This table lists most of the options that the use could define. If
they define an option, it will override the default set by
MAExplorer. The values are shown for some typical databases. (See (A
HREF="#optTblNotation">Section C.1.1 for option notation.)
A) Array geometry parameters
Parameter
Value
DataType
Comments
(req) MAX_FIELDS
2
int
# duplicate grids (blocks, patch, etc.) of spots for
each gene in the array (i.e. F1, F2, etc.). Note that Cy3 and Cy5 data
for each spot count as one field.
(req) MAX_GRID_COLS
24
int
# cols/grid in the array
(req) MAX_GRID_ROWS
9
int
# rows/grid in the array
(req) MAX_GRIDS
8
int
# grids in the array
(opt) ignoreExtraFields
FALSE
boolean
if there are additional fields of data in the
GIPO or .quant files, then ignore them. Only use the first rawIntensity
field. Note: this option is not normally used.
(opt) reuseXYcoords
FALSE
boolean
Reuse XY coordinates from first sample
for rest of the samples
(opt) SpotRadius
7
int
(2 to 20 pixels) 50 microns, scroller.
Note: this should be set to about 4 or 5 for a 10000 gene DB.
(opt) swapRowsColumns
FALSE
boolean
set if swap rows and columns in the array
(used with our particular Research Genetics arrays)
(opt) usePseudoXYcoords
FALSE
boolean
use pseudoarray XY coordinates image if there is no
explicit no XY spot position data generated by the quantification
software
(future) FIELD_LAYOUT
LtoR
String
fields are Left to Right
(future) FIELDS_ARE_NUMBERED
TRUE
boolean
Data files contain field number.
Otherwise field is extrapolated
(future) GRID_LAYOUT
Horizontal
String
Grids are Left To Right in the array
(future) GRID_PER_ROW
4
int
# grids per row in each field of the array
B) Ratio and background parameters
Parameter
Value
DataType
Comments
(ratio) fluorescentLbl1
Cy3
String
name of dye for fluorescent label 1
(ratio) fluorescentLbl2
Cy5
String
name of dye for fluorescent label 2
(ratio) useRatioData
TRUE
boolean
set if data is Cy3/Cy5 ratio data otherwise
it assumes intensity data for each spot
(opt+ratio) useRatioMedianCorrection
FALSE
boolean
when using ratio data mode (Cy3/Cy5),
use ratio median correction as the default
(opt) useBackgroundCorrection
FALSE
boolean
use background correction as the default when startup
(future) useCy5/Cy3
FALSE
boolean
compute Cy5/Cy3 ratios instead of Cy3/Cy5 ratios
C) Names of database, etc.
We indicate example values by italics.
Parameter
Value
DataType
Comments
(opt) calibDNAname
mouse genomic DNA
String
name for calibration DNA if available - replacing
cloneID in the case where the clones are not yet in the I.M.A.G.E.
database. The particular clone is located using the Plate(grid,row,col)
reported when selecting the current gene.
(opt) classNameX
HP-X 'set'
String
default name of HP-X samples 'set'
(opt) classNameY
HP-Y 'set'
String
default name of HP-Y samples 'set'
(opt) dataBase
MGAP DB
String
name of the database project
(opt) dbSubset
Preg 13 vs Lact 1
String
name of the subset of data from the database
(opt) geoPlatformID
GPL80
String
name of the NCBI Gene Expression Omnibus (GEO) Platform Id
(opt) maAnalysisProgram
Research Genetics Pathways 2.01
String
name of spot quantification program
(opt) yourPlateName
your plate
String
name of researcher's clones if available - used in the
cloneID data field in the case where the clones are not yet in the I.M.A.G.E.
database. The particular clone is located using the Plate(grid,row,col)
reported when selecting the current gene.
(See Table 2.4.1)
(opt) emptyWellName
empty wells
String
what you called empty wells if there are any in the
database.
(See Table 2.4.1)
(opt) EditDate
06-19-00, Lemkin
String
comment why changed
D) Display Views
Parameter
Value
DataType
Comments
(opt) gangSpotFlag
TRUE
boolean
set gang spot display on startup for database
with duplicate spots
(opt) presentationViewFlag
FALSE
boolean
start MAExplorer with larger fonts and graphics
symbols suitable for live presentations
(opt) showEGLflag
FALSE
boolean
show EGL genes on startup from previously saved
database that had EGL genes selected.
(opt) showMouseOver
TRUE
boolean
show mouse-over info when move mouse in windows
(opt) useDichromasy
FALSE
boolean
use orange-blue else use red-green color scheme
(opt) viewFilteredSpotsFlag
TRUE
boolean
view Filtered spots the array pseudoimage. If it is
off, it shows just the pseudoarray image without spots passing the
filter or MAExplorer state information.
Note that there are many other parameters reflecting the state of
MAExplorer that are saved in the .mae startup file when doing a (File
| Database | SaveAs...DB) operation. These are reviewed and set from
the MAExplorer menus. These parameters are not listed here - although
they could be used in setting up an initial .mae startup file.
Table C.5.2 List of default threshold database-specific
configuration (Parameter,Value) entries
Some of the default thresholds and sizes may be defined here as
it may be useful to vary them with different types of data.
Parameter
Value
DataType
Comments
(opt) CanvasHorSize
1100
int
pixels, horizontal size of microarray image **DEPRICATED**
(opt) CanvasVertSize
1100
int
pixels, vertical size of microarray image **DEPRICATED**
(opt) fontFamily
SansSerif
String
default text font family. See Font Family for
other fonts. Some fonts look better with some operating systems.
(opt) clusterDistThr
10
float
default cluster similarity threshold in
[0.0 : 100.0], scroller
(opt) maxGenesReported
50
int
max # of genes in highest/lowest gene
report
(opt) maxPreloadImages
4
int
max # HP samples to initially load
(opt) nbrOfClustersThr
6
int
default # clusters for K-means clustering
(opt) pValueThr
0.2
float
default p-value for statistical tests
(opt) spotCVthr
0.25
float
default spot Coefficient of Variation value
(opt) allowNegQuantDataFlag
FALSE
boolean
set if .quant file data has negative intensity values
otherwise it clips the negative values to 0.0
(opt) usePosQuantDataFlag
TRUE
boolean
Filter out genes where .quant file data has negative
intensity values otherwise it uses the negative data
Table C.5.3 List of array specific auxiliary database files
(Parameter,Value) entries
This lists the names of the database-specific auxiliary files. Note
that the names of these files may change with the database but the
name of the initial configuration file containing these names (i.e.
MaExplorerConfig.txt does not change. Optional Parameters
are indicated with a "*" prefix.
(See Section C.1.1 for option notation.)
Parameter
Value
DataType
Comments
(req) gipoFile
GIPO-DB.txt
File
Composite Gene-In-Plate-Order (GIPO) file
containing the spot print order, Clone-IDs, gene names, GenBank
IDs, plate coordinates, etc.
(See Appendix C.4)
(req) samplesDBfile
Samples-DB.txt
File
list of hybridized samples in the database. [Note:
an older depricated name was "membranesDBfile"].
(See Appendix C.2)
(opt) quantFileExt
.quant
String
alternate quantification spot file name extension
to use instead of ".quant". (You might set it to ".txt")
(See Appendix C.3)
Table C.5.4 List of optional (Table,Field) mappings to
configure specific user's data types
Sometimes user data tables contain the proper data required by
MAExplorer, but the names of the columns (i.e. fields) are
different. MAExplorer can map user (table,field) names to the internal
names it uses. This allows users to maintain their tables in the names
they choose. The following mapTF entries are not required if
the fields in the corresponding tables already have the MAE field
name. The entries use the mapping where
[TableName],[MAE field name],[TableName],[User field name]
Parameter
Value
DataType
Comments
(opt) mapTF
GipoTable,grid,GipoTable,SA
String
GIPO table grid name (numbers or letters)
(opt) mapTF
GipoTable,grid row,GipoTable,R
String
GIPO table row of grid name (numbers or letters)
(opt) mapTF
GipoTable,grid col,GipoTable,C
String
GIPO table column of grid name (numbers or letters)
(opt) mapTF
GipoTable,plate,GipoTable,RG Pl
String
GIPO table plate where clone came from
(opt) mapTF
GipoTable,plate row,GipoTable,RG row
String
GIPO table row of plate where clone came from
(opt) mapTF
GipoTable,plate col,GipoTable,RG col
String
GIPO table column of plate where clone came from
(opt) mapTF
GipoTable,Clone ID,GipoTable,Clone id
String
GIPO name of Clone ID
(opt) mapTF
GipoTable,GeneName,GipoTable,Gene name
String
GIPO table map gene name
(opt) mapTF
GipoTable,Unigene cluster ID,GipoTable,ucid
String
GIPO table UniGene cluster id (if available)
(opt) mapTF
Unigene cluster name,GipoTable,ucn
String
GIPO table UniGene cluster name (if available)
(opt) mapTF
GipoTable,GenBank 3',GipoTable,gb3'
String
GIPO table GenBank 3' id (if available)
(opt) mapTF
GipoTable,GenBank 5',GipoTable,gb5'
String
GIPO table GenBank 5' id (if available)
(opt) mapTF
GipoTable,dbEST 3',GipoTable,est3'
String
GIPO table dbEST 3' id (if available)
(opt) mapTF
GipoTable,dbEST 5',GipoTable,est5'
String
GIPO table dbEST 5' id (if available)
(opt) mapTF
QuantTable,grid,QuantTable,SA
String
Quant table array grid name (numbers or letters)
(opt) mapTF
QuantTable,grid row,QuantTable,R
String
Quant table row of grid name (numbers or letters)
(opt) mapTF
QuantTable,grid col,QuantTable,C
String
Quant table column of grid name (numbers or letters)
(opt) mapTF
QuantTable,RawIntensity,QuantTable,Intensity
String
Quant table RawIntensity data
(opt) mapTF
QuantTable,Background,QuantTable,BkgrdIntens
String
Quant table background intensity
(opt) mapTF
QuantTable,RawIntensity1,QuantTable,Cy3RI
String
Quant table RawIntensity1 Cy3 data
(opt) mapTF
QuantTable,RawIntensity2,QuantTable,Cy5RI
String
Quant table RawIntensity2 Cy5 data
(opt) mapTF
QuantTable,Background1,QuantTable,BkgrdCy3RI
String
Quant table background intensity for Cy3
(opt) mapTF
QuantTable,Background2,QuantTable,BkgrdCy5RI
String
Quant table background intensity for Cy5
Parameter
Value
DataType
Comments
(opt) dbEstURL
http://www.ncbi.nlm.nih.gov/irx/cgi-bin/birx_doc?
dbest+String
NCBI dbEst server by dbEST ID.
You may use an alternative server.
(opt) GenBankAccURL
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Search&db=Nucleotide&term=
String
NCBI GenBank server by GenBankAcc ID.
You may use an alternative server.
(opt) GenBankCloneURL
http://www.ncbi.nlm.nih.gov/irx/cgi-bin/submit_form_query?
TITLE=dbEST+Retrieval+Output&INPUTS=1&
BRACKETS=NONE&ADDFLAGS=-b&DB=dbest&
NDOCS=10&Q1=String
NCBI GenBank entry by Clone_ID server.
You may use an alternative server.
(opt) GenBankCloneURLepilogue
[clin]
String
Epilog added after Clone_ID.
You may use an alternative server.
(opt) IMAGE2GenBankURL
http://nciarray.nci.nih.gov/cgi-bin/UG_query.cgi?
ORG=Mm&ACC=IMAGE:String
lookup GenBank from CloneID server.
You may use an alternative Image to GenBank server.
The "ORG=Mm" should be changed to reflect
the proper species, eg. "ORG=Hs" for human, etc.
(opt) IMAGE2GIDURL
http://nciarray.nci.nih.gov/cgi-bin/UG_query.cgi?
ORG=Mm&GID=IMAGE:String
NCI/CIT lookup GenBank GID from CloneID server.
You may use an alternative CloneID to GenBank GID server.
The "ORG=Mm" should be changed to reflect
the proper species, eg. "ORG=Hs" for human, etc.
(opt) IMAGE2unigeneURL
http://nciarray.nci.nih.gov/cgi-bin/UG_query.cgi?
ORG=Mm&CLONE=IMAGE:String
NCI/CIT lookup UNIGENE from CloneID server.
You may use an alternative CloneID to UniGene server.
The "ORG=Mm" should be changed to reflect
the proper species, eg. "ORG=Hs" for human, etc.
(opt) unigeneURL
http://www.ncbi.nlm.nih.gov/UniGene/clust.cgi?
ORG=Hs&CID=
String
NCBI UNIGENE by Clone ID server.
You may use an alternative UniGene server.
The "ORG=Hs" should be changed to reflect
the proper species, eg. "ORG=Mm" for mouse, etc.
(opt) locusLinkURL
http://www.ncbi.nlm.nih.gov/LocusLink/list.cgi?
SITE=104&V=1&ORG=Hs&ORG=Mm&ORG=Rn&ORG=Dr&ORG=Dm&Q=
String
NCBI LocusLink by GenBank ID server.
The LocusLink server is accessed by LocusID
gbid2LocusLinkURL
http://www.ncbi.nlm.nih.gov/LocusLink/list.cgi?SITE=104
&V=1&ORG=Hs&ORG=Mm&ORG=Rn&ORG=Dr&ORG=Dm&Q=
String
NCBI LocusLink by LocusID server.
The LocusLink server is accessed by LocusID
(opt) swissProtURL
http://www.expasy.ch/cgi-bin/get-sprot-entry?
String
SwissProt by SwissProt ID
(opt) omimURL
http://www.ncbi.nlm.nih.gov:80/entrez/dispomim.cgi?id=
String
NCBI OMIM database by OMIM ID
(opt) pirURL
http://pir.georgetown.edu/cgi-bin/iproclass/iproclass?choice=entry&id=
String
PIR ProClass database by SwissProt ID
(opt) GeneCardURL
http://bioinfo.weizmann.ac.il/cards-bin/carddisp?
String
GeneCard DB server.
You may use an alternative server.
(opt) histologyURL
http://mammary.nih.gov/models/
String
E.g NIDDK MGAP histology DB server. If you have
an alternative histology model server, put it here.
(opt) modelsURL
http://mammary.nih.gov/models/
String
e.g. NIDDK MGAP mouse models DB server.
You may use an alternative models server.
(opt) proxyServer
http://www.lecb.ncifcrf.gov/cgi-bin/maeProxySvr?
String
NCI/LECB proxy server to access servers outside of the
Java "sandbox". If you set up MAExplorer on your local server, then]
this should point to a proxy server on your system.
Parameter
Value
DataType
Comments
(opt) HelpMenu1
List of hybridized samples
String
Help sub menu URL
(opt) HelpMenu2
MGAP animal models
String
Help sub menu URL
(opt) HelpMenu3
MGAP home page
String
Help sub menu URL
(opt) HelpURL1
http://www.lecb.ncifcrf.gov/mae/maeHybridizations.html
String
Help sub menu URL
(opt) HelpURL2
http://mammary.nih.gov/models/
String
Help sub menu URL
(opt) HelpURL3
http://mammary.nih.gov/
String
Help sub menu URL
Parameter
Value
DataType
Comments
(opt) PluginMenuName1
New Cluster plot
String
Plugin sub menu string
(opt) PluginMenuStubName1
PlotMenu:cluster
String
name of Plugin menu stub to add menu entry
(opt) PluginClassFile1
NewClusterPlot.jar
String
Name of class file
(opt)sPluginCallAtStartup1
InstallInMenu
String
handling plugins at startup: "InstallInMenu", "RunOnStartup", "NoInstall"
(opt) PluginMenuName2
New sample report
String
Plugin sub menu string
(opt) PluginMenuStubName2
ReportMenu:sample
String
name of Plugin menu stub to add menu entry
(opt) PluginClassFile2
NewSampleReport.jar
String
Name of class file
(opt)sPluginCallAtStartup2
InstallInMenu
String
handling plugins at startup: "InstallInMenu", "RunOnStartup", "NoInstall"
(opt) PluginMenuName3
Client-server
String
Plugin sub menu string
(opt) PluginMenuStubName2
-none-
String
name of Plugin menu stub to add menu entry
(opt) PluginClassFile2
ClineServerMAE.class
String
Name of class file
(opt)sPluginCallAtStartup2
InstallInMenu
String
handling plugins at startup: "InstallInMenu", "RunOnStartup", "NoInstall"
List of acceptable Menu stub names for: PluginMenuStubName
When MAEPlugin's are available, you will be able to insert them into various
parts of the MAExplorer menu. If the menu stub is not found, it will
install them in the generic "Plugin" pull-down menu.
C.6 Using the Cvt2Mae 'wizard' tool to convert your array
data for use with MAExplorer
In order to use MAExplorer on your data, you must convert your data
files into the data formats described in
Appendix C and Appendix D. Although
we and others have done this by editing user's data files into the
required formats, it is a non-trivial process. Handling grid geometries that don't fit our model
If your array geometry does not conform to one of those handled by
MAExplorer (see Section 1.1
Gene coordinate numbering on the microarray), then treat the data
as an list of spots. In Cvt2Mae Wizard panel "[2] Grid geometry data",
select the checkbox "Use # spots (BELOW), else grid-geometry (ABOVE)"
and then enter the total number of spots in the line below. This will
construct an arbitrary pseudoarray geometry to serve as a basis to
display the microarray pseudoimage (see the Algorithm for constructing
the pseudo array from a list of spots in Appendix C.6). For
example, this might be used in the case where your arrays used
meta-grids.
Converting data for known chip "Array Layouts" or lists of quantified
spots
Assuming the desired array is in the list of chip array layouts,
follow the eight step process below with steps 3 and 5 omitted. If the user
must describe their own array data using the <User-defined> chip
array layout, then they would do step 3. If your chip is one of the
chips listed in the chip Array Layouts list, then you may be able to
do an "Edit Layout" to modify the description without having to define
the chip layout from scratch - in which case do step 3.
Click on the "Start.mae" file to start MAExplorer on the next
data. This assumes that you have previously installed MAExplorer.
Algorithm: Generation of a pseudoarray image geometry if no array geometry is specified
MAExplorer requires the data in the GIPO and Quant files be specified
by a spot position. This is indicated by the array spot geometry of
(#fields, #grids, #rows/grid, #columns/grid). The #fields is the
number of duplicated sets of grids if available - it is 1
otherwise. This 4-tuple must be specified in the Configuration file.
However, some array data does not have a spot geometry position data
available. The alternative is to generate a pseudoarray geometry. This
is possible since the pseudoarray image in MAExplorer is used simply
to indicate success of the data filter or relative differences
depending on the "Plot | Show Microarray" option. In Cvt2Mae we
generate a visually appealing pseudoarray image geometry if no array
geometry is specified with the data (e.g. Affymetrix data, etc). The
algorithm presented below will generate a geometry
(nGrids,nGridRows,nGridCols) that is compatible with the
visual use of the pseudoarray. The only assumption is the
nRowsExpected, the number of spots in the microarray (rows in
the database input file). The number of spots in the array is computed
automatically and the option to use the pseudoarray instead of the
actual array geometry is selected in the
Edit Layout Wizard for Grid Geometry.
OPT_GRID_SIZE = 1200; /* Optimal grid size for MAExplorer viewing */
ROWS_TO_COLS_ASPECT_RATIO = 3.0/4.0; /* desired rows/cols aspect aspect for a grid */
extra = 0; /* # of extra grid cols required */
/* Estimate # of grids. Assume a square aspect ratio */
if(n <= OPT_GRID_SIZE)
nGrids = 1;
else
nGrids = (n / OPT_GRID_SIZE)+1;
/* Estimate rows (r) and columns (c) from a rectangular grid
* where cols = (4/3) rows.
* Then, c = (4/3)r and r*c= area.
* Then (4/3)*r*r = area or
* r = sqrt((3/4)*area).
*/
if(nRowsExpected > 0)
while(true)
{ /* iterate to optimal size */
gridSize = n/nGrids;
nGridRows = sqrt( ROWS_TO_COLS_ASPECT_RATIO * gridSize );
nGridCols = (nGridRows / ROWS_TO_COLS_ASPECT_RATIO);
nGridCols += extra;
estTotSize = (nGrids * nGridRows * nGridCols);
if(estTotSize > nRowsExpected)
break;
else
extra++; /* keep trying until meet criteria */
} /* iterate to optimal size */
Appendix D. Use of MAExplorer as a stand-alone application
The MAExplorer program is used primarily as a stand-alone Java
Application. The same Java program may also be used as a Java Applet
that runs within a Web browser. Both the applet version and the
stand-alone version may be run on a stand-alone computer without an
Internet connection if the microarray database data files reside on
that machine. The stand-alone Java version is more robust and has some
important advantages discussed in the Introduction and Appendix E.2.
D.1 Installing MAExplorer as stand-alone application
It is possible to download MAExplorer for data-mining either
stand-alone on an your personal computer or in setting up a Web site
for publishing your array data (like the MGAP site).
D.2 Downloading MAExplorer for stand-alone use with other arrays
After MAExplorer is installed, you may start it by clicking on a
startup icon for one of the saved databases (Windows) or running the
program (Unix, Macintosh). In Windows, you may also start it from the
Window's "Start" menu. This stand-alone version enables you to save
data on your local machine and to run MAExplorer independent of an
Internet connection as well as other features that are discussed
throughout this manual.
D.2.2.1 Subsequent updating of only the MAExplorer JAR file from the
MAExplorer server
Generally, you can greatly shorten the subsequent download of new
versions of MAExplorer. This will work most of the time as most often
this is the only change in the distribution (not including the
documentation). If this causes problems, it may mean that the
other files have changed and you may have to reinstall the complete
distribution as in Appendix D.2 above.
Additional instructions for Installing MAExplorer on Sun Solaris
These instructions are valid for use with both TWM and CDE window
managers. If you have problems with the Sun installer, you may need to
update your Solaris OS system patches using a recent patch set.
It may involve more than a single patch. It is the latest Recommended
Patch Cluster from Sun. We STRONGLY recommend having
your System Admininistrator do this for you if you have not done this
before. Point your Web browser to:
http://sunsolve.Sun.COM/pub-cgi/show.pl?target=patches/patch-access
and choose the appropriate patch set for the version of Solaris (2.6,
7, or 8, etc.) that you are running. Do not choose any of the x86 versions
unless you are running Solaris x86. Click on either the Download HTTP
option or Download FTP option, and click the GO button to download the
patch set.
If you requested it, note that a Java virtual machine is included with
this download. It will run automatically when you run the shell
script.D.2.2 Enabling MAExplorer to fetch data from a microarray Web
server
When run as a stand-alone application, MAExplorer can be set up to
read data from a MAExplorer enabled Web database server and to cache
the data to your local computer. The following lists the three
variables that should be set in the .mae startup database
file (see full
table in discussion on setting up the configuration file). These
variables may be set from the Edit menu Preferences submenu Use Web
DB and Web DB data caching toggle switches, and the URL of
the database by the Set Web DB command.
Parametery
Value
DataType
Comments
(opt) enableFIOcaching
TRUE
boolean
enable caching data files from Web server on local compute
(opt) saCodeBase
http://www.lecb.ncifcrf.gov/mae/
String
Web database to use to get the data
(opt) useWebDB
TRUE
boolean
set get data from a Web database
D.2.3 Enabling MAExplorer cache to save Web data on local
computer
When run as a stand-alone application, MAExplorer can be set up to
cache data from a MAExplorer enabled Web database server to your local
computer. You can enable caching for future access using the Edit
menu Preferences submenu Web DB data caching toggle switch.D.3 Starting MAExplorer by clicking on a .mae file
Once the MAExplorer stand-alone application has been installed and
registered with the operating system, then for Windows (and other
operating systems when you can register files for startup) just click
on the icon for any file with a .mae file extension to start
MAExplorer on that data. For Unix systems you can start MAExplorer on a
.mae file by:
(installation path)/MAExplorer MAE/(some startup file).mae
set path = ($path <MAExplorer installation path>)
MAExplorer MAE/(some startup file).mae
D.4 The data file format for .mae startup files
When MAExplorer is used as a stand-alone application, it first reads a
(tab-delimited) startup file. This file contain the names of the
hybridized samples to be loaded as well as some of the additional
parameters listed here. Note most of these parameters may be specified
in the configuration file as defaults and therefore do not need to be
included in the .mae file unless you wish to override the configuration
values. (See Table C.5
tables for a list of these parameters).
Name
Value
image1
C57B6-L1-30min
image2
C57B6-L3-1hr
image3
C57B6-L10-29hrs-1
image4
Stat5a.--.L1-30min
Name
Value
DataType
Comments
(opt) maxPreloadImages
4
int
overide the number of samples (called images) to
actually load. This may be less than the number of image
entries.
(opt) configFile
MaExplorerConfig-MGAP.txt
String
name of Configuration file if not MaExplorerConfig.txt
(opt) dataBase
MGAP DB
String
name of this specific database
(opt) dbSubset
Pregnancy 13 days: C57BL/6 vs. stat5a (-,-), 8 samples
String
title for this subset name of the database
(opt) Xlist
1,2,3
String
hybridized samples for initial HP-X 'set'.
Corresponding to image1, image2, etc. Empty if not defined - may be
defined using the Choose HP-X(Y,E) in the File menu.
(opt) Ylist
4
String
hybridized samples for initial HP-Y 'set'.
Corresponding to image1, image2, etc. Empty if not defined - may be
defined using the Choose HP-X(Y,E) in the File menu.
(opt) Elist
1,2,3,4
String
hybridized samples for initial HP-E 'list'.
Corresponding to image1, image2, etc. Empty if not defined - may be
defined using the Choose HP-X(Y,E) in the File menu.
(opt) classNameX
C57B6 lactation (days 1,3,10)
String
Experimental class name for the HP-X 'set'
of hybridized samples
(opt) classNameY
Stat5a (-,-) lactation day 1
String
Experimental class name for the HP-Y 'set'
of hybridized samples
(opt) noMsgReporting
TRUE
boolean
If set TRUE, used with Applet only to not send loading
status message.
(opt) reuseXYcoords
FALSE
boolean
If set TRUE and the quantified data files have the (x,y)
coordinates for each spot, then use the same coordinates for all
subsequent data files so that the arrays can be superimposed (for Flickering
two HPs).
(opt) usePseudoXYcoords
FALSE
boolean
If set TRUE, force MAExplorer to generate pseudoarray (X,Y)
spot coordinates and ignore (X,Y) data in the quantified spot files if
it exists. This will be set to TRUE automatically if there are no
(X,Y) data fields in the quantified spot files.
D.5 Using MAExplorer as an Applet on your computer
It is possible to create a Web site to publish users data using the MAExplorer.jar file to
support your private microarray database Web site. (Note that you can
also get the MAExplorer.jar file from the directory where you
installed MAExplorer on your computer). You might choose to mimic the
way we did the
http://www.lecb.ncifcrf.gov/mae MGAP Web site or organize it
differently. You need to do the following:
<HTML>
<HEAD>
<TITLE>MAExplorer Startup: C57B6 Pregnancy vs Lactation</TITLE>
</HEAD>
<BODY>
<H2>MAExplorer Startup: C57B6 Pregnancy vs Lactation</H2>
This startup database will start the MAExplorer. It contains a subset
of the database consisting of four C57B6 mammary development
hybridized samples (HP): two each for pregnancy and lactation.
<APPLET CODE=MAExplorer.class ARCHIVE=MAExplorer.jar
WIDTH=10 HEIGHT=10 ALIGN=absmiddle>
<PARAM NAME=configFile VALUE=MaExplorerConfig-MGAP.txt>
<PARAM NAME=dbSubset VALUE="C57B6 pregnancy vs lactation">
<PARAM NAME=image1 VALUE=C57B6-p13.1>
<PARAM NAME=image2 VALUE=C57B6-L1-30min>
<PARAM NAME=image3 VALUE=C57B6-p13.2poly-A>
<PARAM NAME=image4 VALUE=C57B6-L1-total>
<PARAM NAME=Xlist VALUE=1,3>
<PARAM NAME=Ylist VALUE=2,4>
<PARAM NAME=Elist VALUE=1,3,2,4>
<PARAM NAME=classNameX VALUE=Pregnancy>
<PARAM NAME=classNameY VALUE=Lactation>
(Sorry, you need a Java-capable browser to view this.)
</APPLET>
</BODY>
</HTML>
D.6 List of startup .mae files included in the download installation
This is a list of the .mae startup files from the MAExplorer MGAP
database. These are included with the distribution for use in the
tutorials, etc. They include data from 50 hybridized samples of mouse
mammary breast tissue including normal and some knockout
samples. There are about 1700 duplicate clones on the arrays which are
membranes printed by Research Genetics and hybridized by the MGAP
group. See the MGAP
home page for more information on these samples. The .mae files
are available as separate files
http://www.lecb.ncifcrf.gov/mae/MGAP-Array-database/. The data is
also packaged as a zip file
http://www.lecb.ncifcrf.gov/mae/MGAP-Array-database.zip, and a
Unix tar file
http://www.lecb.ncifcrf.gov/mae/MGAP-Array-database.tar.
.mae file name
X vs Y comparison
# of hybridizations
C57vsDevModels-15probes.mae
HP-XY is C57B6 vs developmental models
15
C57vsDevModels-15probes-cache.mae
HP-XY is C57B6 vs developmental models (with
cache) 15
C57vsDevModels-38probes.mae HP-XY is C57B6 vs developmental models. HP-E is all
samples 38
Lact1-C57vsStat5a-38probes.mae HP-XY is C57B6 Lactation day 1 vs Stat5a (-,-). HP-E is all
samples 38
Lact1vs10-10probes.mae HP-XY is C57B6 Lactation day 1 vs Lactation day 10. HP-E is
all samples 10
Lact1vs10-38probes.mae HP-XY is C57B6 Lactation day 1 vs Lactation day 10. HP-E is
all samples 38
Lact-C57vsStat5a-5probes.mae HP-XY is C57B6 Lactation day 1 vs Stat5a (-,-)
5
Lact-C57vsStat5aCEBPnull-19probes.mae
HP-XY is C57B6 Lactation day 1 vs Stat5a (-,-) and
CEBP-null, HP-E has samples of other tissues 19
MAEstartupDefault.mae
none
none
Preg13day-C57vsStat5a-19probes-cache.mae
HP-XY is C57B6 Pregnancy day 13 vs Stat5a (-,-).
HP-E has samples of other tissues (with cache)
19
Preg13day-C57vsStat5a-19probes.mae
HP-XY is C57B6 Pregnancy day 13 vs Stat5a (-,-).
HP-E has samples of other tissues
19
Preg13day-C57vsStat5a-38probes.mae
HP-XY is C57B6 Pregnancy day 13 vs Stat5a (-,-).
HP-E is all samples
38
Preg-C57vsStat5a-4probes.mae HP-XY is C57B6 Pregnancy day 13 vs Stat5a (-,-)
4
Preg13VsLact1-18probes.mae HP-XY is C57B6 Pregnancy day 13 vs Lactation day 1. HP-E is
all samples 18
Preg13VsLact1-38probes.mae HP-XY is C57B6 Pregnancy day 13 vs Lactation day 1. HP-E is
all samples 38
Preg-C57vsStat5a-8probes.mae HP-XY is C57B6 Pregnancy day 13 vs Stat5a (-,-)
8
Preg13day-Stat5aVsCEBP-null-38probes.mae
HP-XY is C57B6 Lactation day 1 vs Stat5a (-,-) and
CEBP-null, HP-E is all samples 38
reuseXY-Preg13day-C57vsStat5a-38probes.mae
HP-XY is C57B6 Pregnancy day 13 vs Stat5a (-,-). Use
XY coords of first probe for remainder for flickering. HP-E is all
samples 38
reuseXY-Preg-C57vsStat5a-8probes.mae
HP-XY is C57B6 Pregnancy day 13 vs Stat5a (-,-). Use
XY coords of first sample for remainder for flickering
8
MGAP-50samples.mae
C57B6 day 13 preg. vs day 1 lact., 50 samples
50
OCL-P13L1L10Stat5a--15probes.mae
replicates of C57B6 (pregnancy day 13, lactation
days 1 and 10, and stat5a(-,-) 15 samples. The database also includes
4 additional condition sets of this data and an Ordered Condition List
of the 4 conditions (in the State/ directory). This may be used to
demo the OCL F-test filter.
15
Appendix E. Design issues
This appendix addresses a number of key design issues on the
implementation of MAExplorer and the implications they have on its
efficiency. The ordinary user of MAExplorer need not be concerned with
any of these issues. A PowerPoint presentation describing the class structure of the "Software design of the
MAExplorer data mining tool" is available as either an Adobe Acrobat file (PDF) or a PowerPoint file (PPT).E.1 Internal data structures design to facilitate direct manipulation
MAExplorer was constructed using a number of fundamental data objects
including clones (genes), hybridized samples (membranes or glass
arrays), tables, etc. organized using an object-oriented methodology
enforced by Java. Sets of genes are implemented as bit sets for
efficiency in both storage and set-theoretic operations. With a set
being implemented as 64-bits/word, a set intersection, union or
difference can be performed on 64 genes in parallel in one logical
(i.e. AND, OR, XOR) computer instruction. This makes the data filter
quite efficient when computing the intersection of many gene sets.
When ordered gene lists are required, memory and compute intensive
lists are used - but only when needed. Tab-delimited ASCII is used as
the basic I/O file type for all types of data. This simplifies I/O
and allows data to be prepared with a variety of systems including
Excel, array quantification programs, relational database systems,
etc.E.2 Approaches to data mining: client-centric and server-centric models
There is a range of approaches for performing data mining of
microarray data over the Internet. However, all assume rapid access
to underlying databases and the ability to transform data from one
presentation mode to another where differences might be easily
observed. One extreme is the server-centric model using CGI or
Applets in Web browser. This assumes that all data search and analysis
is performed on a back-end server and graphic or tabular results from
the server are sent back to the researcher over the Internet. The
server-centric model has the advantage of keeping all user data
up-to-date, but the disadvantage of performing all computations and
graphics generation on the back-end server. Relying so much on the
server for major computations and graphics generation can result in
significant delays if the networks or servers are heavily loaded. The
other extreme is the client-centric model. Here all of the data being
analyzed is copied to a user's computer and computationally expensive
analyses are done there. This has the disadvantage for the user of
possibly not having the most up-to-date data to analyze as well as
setup time overhead. However, it does distribute the computational
load, allowing more effective data mining with many alternate views
and avoiding excessive delays during a data mining session. In both
the Web browser applet and the stand-alone application, data is
downloaded to MAExplorer. The difference being access to the local
file system with some additional capabilities in the case of the
latter.
Approach
Advantage (+)
disadvantage (-)Feature
Client-centric a)
+
Java programs run (pretty much) on all operating system
platforms as either stand-alone or applets (in browsers)
Client-centric b)
+
handles rapid response required for direct manipulation on
the new generation of very fast desktop computers
Client-centric c)
+
stand-alone version may be restarted quickly
from local data or data cached from the Web server
Client-centric d
+
size limitations are not a problem with
stand-alone Java applications
Client-centric e)
+
Java plug-ins allows prototyping new local and Web DB
analysis method functionality by any group of users
Client-centric f)
-
for the applet version, there is slow startup
because the program and all data has to be downloaded each time it is run
Client-centric g)
-
difficult to build large stable Web-applets handling very
large data sets. However, stand-alone applications don't have this problem
Client-centric h)
-
for the stand-alone application version,
it must be installed on client's computer where there nmight be some
level of incompatibility
Approach
Advantage (+)
disadvantage (-)Feature
Server-centric a)
+
may have better resources for very large data sets but with dependence on server
Server-centric b)
+
faster startup than downloaded applet since minimal GUI is required
and data does not have to be loaded before computation requests may be
made to the server
Server-centric c)
+
may be easier to prototype and distribute new functionality using third
party software such as RDBMS, S-plus, etc. using centralized CGI or
servlets where only one copy is required on the server
Server-centric d)
-
susceptible to Internet traffic bandwidth problems
for large numbers of users
Server-centric e)
-
susceptible to server-load dependencies for large numbers of users
Server-centric f)
-
difficult to get very rapid response for direct
manipulation for data mining
E.3 Conversion of microarray data files to MAExplorer format using
Cvt2Mae
A tool is being developed that converts microarray data files, both
commercial and one-of-a-kind research data to a complete MAExplorer
data format. Input data will be tab-delimited, although it may be possible
to use XML data at some point. When the tool becomes available, it will
be announced on the MAExplorer home page and in this manual.
Cvt2Mae data converter
Because it is difficult to manually edit user's microarray quantified
data files, we constructed the
Cvt2Mae data converter program (also see Appendix C.6). The idea
is to create array layouts for known array chips and to let the user
define their own for specialized arrays. These user-defined layouts
may then be saved and used in subsequent data conversions. The basic
problem of data conversion is that of "field picking" to map user data
fields to those required by MAExplorer, and of setting the appropriate
options in the MAExplorer configuration files. User-interactive
wizards query the user and then does this information to perform the
conversion generating the output data files that are ready to use with
MAExplorer. Cvt2Mae then generates the directory tree of required data
files described in Appendix C.
E.4 Extending MAExplorer functionality using Java Plugins
We are adding the ability for users to add their own Java Plug-in
Extensions to MAExplorer. These will extend capabilities of the core
MAExplorer program to other analysis methods by users. The MAEPlugins Web site will be an
Open Java API, open-source Java code examples, our plugins and donated
plugins, links to plugins at other Web sites. Typical plug-ins
include: normalization, Filters, PCA, clustering, client-server,
Web-server functional analysis of cluster results, etc. We group these
into three types of new analytic functionality:
The MAExplorer Open Java API (Applications Programming Interface) will
allow users to get at all data structures without understanding the
details of the system. The specialized application classes are derived
from the GatherScatterAPI class which can access all of the internal
MAExplorer data structures. This allows us to improve and change the
internal data structures without causing problems with plugins using
those data structures.
E.5 Web database server design
Although MAExplorer can be run stand-alone on a user's computer,
additional capabilities may be made available with support from the
back-end Web database server. This server design, used with the MGAP
database, includes several distinct functions (Figure 1). The primary
one is the hosting of login-protected microarray quantitative data and
auxiliary flat files required to support basic MAExplorer
operations. These "flat files" could be synthesized on the fly from
searches on a relational database server that is part of the
microarray database Web server. The public database does not require a
login while the collaborator subset of the database does.
Downloading the stand-alone MicroArray Explorer (current release)
You may freely download and install the
current stable release of the stand-alone version of the
MAExplorer program. You are free to use or redistribute MAExplorer (see disclaimer). We also include a
subset of 50 Mammary Genome Anatomy Program (MGAP) hybridized
sample data to run stand-alone on your computer platform. These are
may also be downloaded directly as:
SourceForge
Download MAExplorer Installer
Distribution contents
This will not affect any of your
other Java applications or Web browsers as it is used only with
MAExplorer.
1. Procedure for downloading and installing MAExplorer on your
computer
1. Click here to select the current
installer for your operating system. This Web page allows you to
select the operating system you are using. If you have problems
downloading the installer with Netscape 4.7x or later, then try
Internet Explorer 5.0. It could be a Mime/type problem with your
browser setup.
Other systems will be added as installers become available through
InstallAnywhere (www.ZeroG.com).
1.1 The Default Installer
Alternatively, you can use the default installer that is selected for
your computer. If you want to control where
the files are saved on your computer, then use the explicit installer
for your particular platform described above.
1.2 Installation Notes
Currently, the Windows and Linux installers are robust. We have had
mixed success with Mac OS and Solaris.1.3 Downloading just the MAExplorer.jar file after initial install
If you have previously done an installation. you may avoid a complete
re-installation download by getting just the latest Java MAExplorer.jar file. You should replace the
old version of this file on your system with the one you are
downloading. This will work if the new MAExplorer.jar file does not
depend on any new entries in the configuration files (which generally
the case - try it and see what happens).
.mae startup file
Data set contents
Lact-C57vsStat5a-5probes.mae
5 probes. (X,Y) is lactation day 1 (C57B6, Stat5a(-,-))
Lact-C57vsStat5aCEBPnull-19probes.mae
19 probes. (X,Y) subset is lactation day 1
(C57B6, Stat5a(-,-) + CEBP-null)
Lact1-C57vsStat5a-38probes.mae
38 probes. (X,Y) subset is lactation day 1 (C57B6, Stat5a(-,-))
Lact1vs10-38probes.mae
38 probes. (X,Y) subset is C57B6 lactation day (1,10)
MAEstartupDefault.mae
No initial samples loaded
Preg-C57vsStat5a-4probes.mae
4 samples. (X,Y) is pregnancy (C57B6, Stat5a(-,-))
Preg-C57vsStat5a-8probes.mae
8 samples. (X,Y) is pregnancy (C57B6, Stat5a(-,-))
Preg13VsLact1-38probes.mae
38 samples. (X,Y) subset is pregnancy (C57B6, Stat5a(-,-))
Preg13day-C57vsStat5a-19probes-cache.mae
19 samples from MGAP Web server. (X,Y) subset is
pregnancy (C57B6, Stat5a(-,-))
Preg13day-C57vsStat5a-19probes.mae
19 samples. (X,Y) subset is pregnancy (C57B6, Stat5a(-,-))
Preg13day-C57vsStat5a-38probes.mae
38 samples. (X,Y) subset is pregnancy (C57B6, Stat5a(-,-))
Preg13day-Stat5aVsCEBP-null-38probes.mae
19 samples. (X,Y) subset is pregnancy (Stat5a(-,-),CEBP-null)
reuseXY-Preg-C57vsStat5a-8probes.mae
Same as other startup, but uses XY coordinates of 1st sample
reuseXY-Preg13day-C57vsStat5a-38probes.mae
Same as other startup, but uses XY coordinates of 1st sample
C57vsDevModels-15probes-cache.mae
15 samples from MGAP cache. (X,Y) subset is
(C57B6, knock-outs)
C57vsDevModels-15probes.mae
15 samples. (X,Y) subset is (C57B6, knock-outs)
C57vsDevModels-38probes.mae
38 samples. (X,Y) subset is (C57B6, knock-outs)
MGAP-50samples.mae
50 samples. All of the public samples sorted alphabetically
2.2 Starting MAExplorer Using a ".mae" Startup File
If you are on Windows 95/98/NT/2000/XP system, simply click on the
.mae file you want to use. Hint: you might put a short-cut to the
installation-directory\MAE\ directory on your desk-top
to make it more convenient to find the files.
cd installation-directory/
MAExplorer.bin MAE/Preg13VsLact1-38probes.mae
2.3 The MAExplorer Error Log File
Each time you run MAExplorer, it creates or overides the previous
error log file called MAEerr.log in the
installation-directory. If you are experiencing major problems,
this file is useful to us in helping figure out what is
wrong. Otherwise, just ignore it.
2.4 Problems installing MAExplorer on some operating systems
2.5 FAQ of problems using MAExplorer on Mac OS for NCI/CIT mAdb users
Q: How many characters can I use in array names for data to be donwloaded
to MAExplorer?
Open Stuffit Expander by double clicking its icon
Click on menu File -> Preferences
Click on "Cross Platform"
Click on "Never" button of 'Convert text file to Macintosh format:'
Your .zip will be decompressed properly and the text files from your mAdb
data can now be open by MAExplorer.
2.6 Sun Solaris (or other Unix system) Memory Problems
We have on occasion seen the following types of memory errors. This
discusses how to handle them.
MAExplorer Stack size Memory Error on Sun Solaris
Running MAExplorer on a Solaris (or other Unix system) may
produce this error:
% MAExplorer
Stack size of 97664 Kb exceeds current limit of 8192 Kb.
(Stack sizes are rounded up to a multiple of the system page size.)
See limit(1) to increase the stack size limit.
If the Sun (under Solaris) is slow in loading MAExplorer or has memory
errors (shown above) one should first see what the memory limits are
set to on your machine using the "limit" command. If they are too
small they should be increased or set to "unlimited" (see in 2.4 above MAExplorer LAX file
If the problems persist, one might have to edit the MAExplorer.lax
file found in the MAExplorer directory (see example below). The default
memory settings in the MAExplorer.lax file (found in the installation
directory) should be no larger than the total memory of the machine or
paging problems will occur. For instance, if you have 192Mb of memory
in your Sun, edit the
"lax.nl.java.option.native.stack.size.max" and
"lax.nl.java.option.java.heap.size.max" options to be under
192Mb. You can use any text editor to do this. More memory may be
needed to be installed on your Sun to run MAExplorer with very large
datasets.
The Lax file is a startup file generated by InstallAnywhere when we
packaged MAExplorer. It is used when MAExplorer starts up on your
computer. We currently set the memory limits to 256Mbytes. If you have more
memory, you can edit the Lax file to have it use more memory.
# LAX.NL.JAVA.OPTION.JAVA.HEAP.SIZE.MAX
# -------------------------------------
lax.nl.java.option.java.heap.size.max=256000000
# LAX.NL.JAVA.OPTION.NATIVE.STACK.SIZE.MAX
# ----------------------------------------
lax.nl.java.option.native.stack.size.max=256000000
The MicroArray Explorer MAEPlugins Home page
Plugin Tutorial Examples |
List of Plugins |
Developing a Plugin |
Installing Plugins |
MAExplorer home |
MAExplorer revision notes |
Help desk
The Open Java API is available as the set of MJAxxxx classes in the
MAExplorer.jar file.
1. The MAEPlugins home page
This Web page includes documentation, an Open Java API with javadoc
documentation, open source Java source code1.1 MAEPlugins are grouped into three types of implementations
These allow various degrees of portability and server independence.
2. Open Java API
(Applications Programming Interface)
The MAExplorer Open Java API (Applications
Programming Interface) allows users to access all data structures
without having to understand the low level internal details of the
MAExplorer system. 3. Distribution of MAExplorer Plugins
The distribution system for MAEPlugins is very flexible. There are
several options for distributing Plugins on this Web site including:
4. Lists of MAEPlugins being made available
5. How to write a MAEPlugin
Design of MAEPlugins
Plugin Tutorial Examples |
List of Plugins |
Developing a Plugin |
Installing Plugins |
MAExplorer home |
MAExplorer revision notes |
Help desk
1. Overview of MAEPlugin design
The MAExplorer Open Java API (Applications Programming Interface)
allows users to access almost all data structures without
understanding the details of the system. Specialized
interfacing classes (MJAxxxx), organized by function, are accessed
from the MaeJavaAPI class. The MJAxxxx classes map internal data to
user data in a protected manner. Users do not have direct access to
internal MAExplorer data structures. However, MAEPlugins do have
access to relevant data. This allows us to improve and change the
internal data structures without causing problems with plugins using
those data structures. The following figures show the top level
plugin design.
2. Example of using a Plugin
This shows a short demonstration of what is involved in using a
MAEPlugin. The user first load the plugin from the disk. Generally the
plugins .jar or .class files are stored in the Plugins/ directory
where you have installed MAExplorer. Then they load a particular
plugin which installs it in the Plugins pull-down menu. Then they
revisit that menu to invoke the particular plugin. You may load any
number of plugins (until you run out of computer memory if that should
occur).MAEPlugins Open Java API
Plugin Tutorial Examples |
List of Plugins |
Developing a Plugin |
Installing Plugins |
MAExplorer home |
MAExplorer revision notes |
Help desk
The first, docsOJAPI/, is the
entire API accessible to the plugin writer including the MAEPlugin
classes required to extend your plugins. However, many of the
MJAxxxx methods are not normally called explicitly by the plugin
writer. Instead, a subset of classes, docsMJA/ constituting the MaeJavaAPI set
of classes, is the library of access methods that the plugin writer
normally uses.
1. List of MaeJavaAPI (MJA) classes
The MJA classes are organized by function. For example, if you want to
access data and methods on samples, then go to the MJAsample or
MJAsampleList classes. See the javadocs for the Open Java API for
details. The detailed descriptions of these classes are available in
the docsMJA javadocs.
MJAxxxx Class Objects and method access
------------- -------------------------
MJAbase base class and constants used by other MJA classes
MJAcluster cluster data structures and methodst
MJAcondition condition lists of samples and ordered lists of condition lists
MJAeval command interpreter to invoke MAExplorer commands
MJAexprProfile expression profiles data
MJAfilter data filters
MJAgene single gene data
MJAgeneList lists of genes and get sets
MJAgenomicDB genomic databases on the Internet
MJAgeometry array geometry, spot to gene maps, etc.
MJAhelp popup browser help methods
MJAhistogram histogram plots
MJAmath built-in math functions
MJAnormalziation normalization data and methods
MJAplot scrollable 2D plot support [Future]
MJAproperty get and put individual properties
MJApropList get lists of properties
MJAsample get and put single sample top-level data
MJAsampleList get lists of samples top-level data
MJAscrollablePlot scrollable 2D plot support [Future]
MJAsort built-in sort methods
MJAstatistics built-in statistics methods
MJAstate get and save state, get additional state info
MJAutil built-in utility methods
How To Write a MAEPlugin using the Open java API
Plugin Tutorial Examples |
List of Plugins |
Developing a Plugin |
Installing Plugins |
MAExplorer home |
MAExplorer revision notes |
Help desk
1. Using a Java development environment to develop and debug a Plugin
We have designed the MAExplorer.jar file so that it contains both
MAExplorer and the Open Java API. All MAExplorer classes are compiled
with the symbol table so that it may be used in a debugger. We use the
Sun's Forte for Java (Community
Edition) which is a free development environment (IDE) available over
the Internet. Forte (now known as "SunONE" and most other IDEs) allows
you "mount" a jar file. So to create a new plugin you would:
2. Installing your MAEPlugin in a working MAExplorer environment
Tutorial Examples of MAEPlugins
Plugin Tutorial Examples |
List of Plugins |
Developing a Plugin |
Installing Plugins |
MAExplorer home |
MAExplorer revisions notes |
Help desk
1. Example of a simple PopupPlugin and how it uses the Open Java API
The following code illustrated how to create a simple popup plugin ExamplePlugin.java using the Open Java
API by extending the PopupPlugin. It passes the MaeJavaAPI mja
instance to the actual workhorse, Example.java, that then retrieves and saves
any MAExplorer data it requires. We show very simple examples of this
code to give the flavor of the procedures required and how it
interfaces with the API.
1.1 Example of plugin class that is loaded into MAExplorer
For convenience, we will name the class that is loaded into MAExplorer
XxxxxPlugin.java and the subsequent primary body of the plugin class
Xxxxx.java where Xxxxx is some particular class. In our following
example, Xxxxx is "Example", but it might be "MyNewClusterMethod" etc.
We first show ExamplePlugin.java that serves as the interface between
MAExplorer and the primary body class
Example.java.
1.1.1. You must import the two class definitions:
import MAEPlugin.popup.PopupPlugin;
import MAEPlugin.*;
If you are writting other types of plugins, you need to import those
instead (eg. MAEPlugin.analysis.NormalizationPlugin,
MAEPlugin.analysis.FilterPlugin, etc).
1.1.2 The XxxxxPlugin.java class must have the following methods as a
minimum:
/** File: ExamplePlugin.java */
import MAEPlugin.popup.PopupPlugin;
import MAEPlugin.*;
/**
* This class invokes the ExamplePlugin plugin.
*/
public class ExamplePlugin extends PopupPlugin implements MAEUpdateListener
{
/** The current instance of a plugin called "Example".
* The instance may be non-null if run previously and is needed to kill
* a previous instance when new instances are created.
*/
private Example
eObj= null;
/**
* ExamplePlugin() - this is the constructor end-users must implement
* to use the API. It is called at the time the plugin is loaded.
*/
public ExamplePlugin() throws PluginException
{ /* ExamplePlugin */
/* Note: "Example plugin" is a string that appears in the
* Plugin menu.
*/
setMenuLabel("Example plugin");
MJApopupRegistry
pr= MAExplorer.mja.mjaPopupRegistry;
int
propBits= (pr.PRPROP_CUR_GENE | pr.PRPROP_FILTER | pr.PRPROP_LABEL |
pr.PRPROP_SLIDER | pr.PRPROP_UNIQUE);
pr.addUniquePopupWindowToReg(this, "ShowListActiveFilters", propBits);
} /* ExamplePlugin */
/** pluginMain() - the method end-users must implement to use the API.
* It is invoked when the user selects the plugin in a menu.
*/
public void pluginMain()
{ /* pluginMain */
MaeJavaAPI
mja= MAExplorer.mja; /* Open Java API library access */
if(eObj==null)
eObj= new Example(mja);
else
{ /* re-rerun Example on new data */
eObj.dispose();
eObj= null;
System.gc();
mja.mjaUtil.maeRepaint();
eObj= new Example(mja);
}
} /* pluginMain */
/** updateCurGene() - update any data since current gene has changed.
* This is invoked by the MAExplorer PopupRegistry.
* @param mid is the MID (Master Gene ID) that is the new current gene.
*/
public void updateCurGene(int mid)
{
if(eObj!=null)
eObj.updateCurGene(mid;
}
/** updateFilter() - update any dependent data since the data Filter
* has changed. This is invoked by the MAExplorer PopupRegistry.
*/
public void updateFilter()
{
if(eObj!=null)
eObj.updateFilter();
}
/** updateSlider() - update any dependent data since a threshold slider
* has changed. This is invoked by the MAExplorer PopupRegistry.
*/
public void updateSlider()
{
if(eObj!=null)
eObj.updateSlider();
}
/** updateLabels() - update any dependent data since global labels
* have changed. This is invoked by the MAExplorer PopupRegistry.
*/
public void updateLabels()
{
if(eObj!=null)
eObj.updateLabels();
}
/**
* close() - close the plugin. This will be called if you
* had specified the plugin as PRPROP_UNIQUE since previous
* instances will be closed before the new instance is started.
* @param preserveDataStructuresFlag to save data structures
*/
public void close(boolean preserveDataStructuresFlag)
{
if(eObj!=null)
eObj.close();
}
} /* end of class ExamplePlugin*/
1.2 Example of the main body of plugin code
The main body of code the plugin writer generates is illustrated here
showing how one might access data and methods from the Open Java API.
We illustrate this with a very simple example, Example.java, showing the entry point a
retrieving a few data structures from the Open Java API. In this
example, we will popup a new Frame and add Action and Window listeners
(code not shown to support the Frame since that is not the point of
this example). However, any Java code could be used.
/** File: Example.java */
public class ListActiveFilters extends Frame
implements ActionListener, WindowListener, etc.
{
/** Example() - Constructor
*/
public Example(MaeJavaAPI mja)
{ /* Example */
/* [1] Access Open Java API required through MaeJavaAPI instances
* of these MJA classes.
*/
MJAfilter
mjaFilter= mja.mjaFilter; /* Open Java API library */
MJAgeneList
mjaGeneList= mja.mjaGeneList; /* Open Java API library */
MJAproperty
mjaProperty= mja.mjaProperty; /* Open Java API library */
MJAsampleList
mjaSampleList= mja.mjaSampleList; /* Open Java API library */
/* [2] Get the data */
String
sR= "Example of some data accessed from MAExplorer\n",
maePrjPath= mjaProperty.getMaeCurProjectPath(),
maeBrowserTitle= mjaProperty.getMaeBrowserTitle(),
maeDatabase= mjaProperty.getMaeDatabaseTitle(),
maeDbSubset= mjaProperty.getMaeDbSubsetTitle();
String
sActive[]= mjaFilter.getListFilterNames();
int
nActive= sActive.length;
sR += " LIST OF ACTIVE FILTERS\n";
for(int i=0;i<nActive;i++)
if(sActive[i]!=null)
sR += " " + sActive[i] + "\n";
int
nSamples= mjaSampleList.getNbrHPsamples();
String
sampleNames[]= mjaSampleList.getHP_Elist_SampleNames();
sR += " LIST OF SAMPLES\n";
for(int i=0;i<nSamples;i++)
sR += sampleName[i] +"\n";
int
filteredMIDlist[]= mjaGeneList.getMIDindicesForFilterGeneList(),
nFilteredGenes= filteredMIDlist.length;
String
filteredGeneNames[]=
mjaGeneList.getGeneFieldDataFromGeneList("workingCL", "GeneName");
sR += " LIST OF FILTERED GENES\n";
for(int i=0;i<nSamples;i++)
sR += "Gene ["+filteredMIDlist[i]+"] = "+filteredGeneNames[i]+"\n";
System.out.println(sR); /* print to java console */
} /* Example */
/* In this example, no actions are taken on popup registry events.
* However, the methods must exist in the code.
*/
public void updateCurGene(int mid) { }
public void updateFilter() { }
public void updateSlider() { }
public void updateLabels() { }
public void close() {this.destroy(); }
} /* end of class Example.java */
List of All MAEPlugins By Origin and Analysis Method
Plugin Tutorial Examples |
List of Plugins |
Developing a Plugin |
Installing Plugins |
MAExplorer home |
MAExplorer revision notes |
Help desk
1. List of all MAEPlugins sorted alphabetically
1.1 Alpha-level MAEPlugins (not fully developed)
1.2 Stable MAEPlugins
2. List of MAEPlugins by analysis method
2.1 Access outside servers to acquire data
2.2 Connections to servers
2.3 Clustering methods
2.4 Data filtering methods
2.5 Normalization methods
2.6 Plot methods
2.7 Report methods
2.8 Sample and condition list manipulation methods
2.9 Visualization methods
2.10 Other methods
3. List of MAEPlugins by links to other Web sites
This list contains links to other Web sites where you may obtain the
Jar files and (or) Java source and documentation for MAEPlugins from
that provider. No MAExplugins will be kept on the this server unless
the source code is included.
Cvt2Mae Data Converter
Cvt2Mae Basics
In order to use the MAExplorer data-mining tool
on your cDNA or oligo tab-delimited array data, you must convert your
data files into the data formats described in Appendix C and Appendix D of the MAExplorer reference
manual. Although this maybe done by editing user's data files by
hand into the required formats, it is a non-trivial process. Therefore
we have developed a "wizard" conversion tool called Cvt2Mae to
automate these conversions.
Cvt2Mae Data Conversion Steps and Tutorials
There is a detailed description on using the
Cvt2Mae converter that provides the level of detail you need to
use it effectively. In addition, a step by step example is provided
converting Affymetrix data.
Tutorials
Downloading latest Cvt2Mae Version
You may freely download and install the current release of the Cvt2Mae
stand-alone application. You are free to use or redistribute
Cvt2Mae. You may want to review the revision history.
Update Cvt2Mae Program from maexplorer.sourceforge.net
As of version 0.71.1 of Cvt2Mae, it is now possible to update the
Cvt2Mae program from the program itself - rather than having to
download the complete installer and then running the installer. Press
the "Update Cvt2Mae" button at the lower left of the corner of Cvt2Mae
when it is running. It asks if you want to update Cvt2Mae. Answer
yes. This will then (1) backup the current Cvt2Mae.jar file as
Cvt2Mae.jar.bkup in the directory where you had initially installed
Cvt2Mae; (2) it then copies the latest Cvt2Mae.jar file from the
maexplorer.sourceforge.net Web site and replaces your working
Cvt2Mae.jar file in your installation directory. You must restart
Cvt2Mae for this to take effect. It will then use the new version of
the program. This is a much less time consuming alternative than doing
an entire download and reinstallation from the Web site.
Frequently Asked Questions (FAQ)
Here are some questions you might have about the Cvt2Mae data converter.
If Additional Help is Needed
Before emailing us for help, please read these Cvt2Mae Web pages to
ensure that you have set the parameters correctly and have the raw
data in the correct format. You might also read the Appendix C of the MAExplorer
manual. MAExplorer and Cvt2Mae also create log files that might be of helpful
in troubleshooting.
Cvt2Mae Data Conversion Steps Description
The Cvt2Mae "Wizard"
You should use the Cvt2Mae program to convert your data to the
MAExplorer format. Cvt2Mae has a multi-step process (wizard) that
allows you to create an Array Layout that describes your data. One
could edit the raw data files by hand but this is tedious and prone to
errors. NOTE: The step titles below are links to in depth
descriptions.
Status Window and Help
There are 3 message areas at the bottom of the Cvt2Mae window that are
used for reporting error and status messages. If certain parameters
are not consistent, error messages will appear in the message area
along with suggestions on how to correct the problem.
Generation of a pseudoarray geometry if no array geometry is specified
MAExplorer requires the data in the GIPO and Quant files be specified
by a spot position. This is indicated by the array spot geometry of
(#fields, #grids, #rows/grid, #columns/grid). The #fields is the
number ofof duplicated sets of grids if available - it is 1
otherwise. This 4-tuple must be specified in the Configuration file.
However, some array data does not have a spot geometry position data
available. The alternative is to generate a pseudoarray geometry. This
is possible since the pseudoarray image in MAExplorer is used simply
to indicate success of the data filter or relative differences
depending on the "Plot | Show Microarray" option. The algorithm
presented below will generate a geometry
(nGrids,nGridRows,nGridCols) that is compatible with the
visual use of the pseudoarray. The only assumption is the
nRowsExpected, the number of spots in the microarray (rows in
the database input file). The number of spots in the array is computed
automatically and the option to use the pseudoarray instead of the
actual array geometry is selected in the
Edit Layout Wizard for Grid Geometry.
OPT_GRID_SIZE = 1200; /* Optimal grid size for MAExplorer viewing */
ROWS_TO_COLS_ASPECT_RATIO = 3.0/4.0; /* desired rows/cols aspect aspect for a grid */
extra = 0; /* # of extra grid cols required */
/* Estimate # of grids. Assume a square aspect ratio */
if(n <= OPT_GRID_SIZE)
nGrids = 1;
else
nGrids = (n / OPT_GRID_SIZE)+1;
/* Estimate rows (r) and columns (c) from a rectangular grid
* where cols = (4/3) rows.
* Then, c = (4/3)r and r*c= area.
* Then (4/3)*r*r = area or
* r = sqrt((3/4)*area).
*/
if(nRowsExpected > 0)
while(true)
{ /* iterate to optimal size */
gridSize = n/nGrids;
nGridRows = sqrt( ROWS_TO_COLS_ASPECT_RATIO * gridSize );
nGridCols = (nGridRows / ROWS_TO_COLS_ASPECT_RATIO);
nGridCols += extra;
estTotSize = (nGrids * nGridRows * nGridCols);
if(estTotSize > nRowsExpected)
break;
else
extra++; /* keep trying until meet criteria */
} /* iterate to optimal size */
Example of Using Cvt2Mae to convert Some Affymetrix data for MAExplorer
1. Specifying the input data files
Figure 1. shows the
Affymetrix tab-delimited data in Excel.
Figure 2. Initial state of
the Cvt2Mae Program.
Figure 3.
Selecting a Chipset Array Layout.
Figure 4.
Selecting one or more user input data files by pressing the "Browse
input file name" button. Then select a user input data file using the
file browser.
Figure 5. Files
selected by user and samples "discovered" in the data file.
2. Editing the array layout
Figure 6. Edit
Layout Wizard for name of the Array Layout with A) original and
B) the new layout name.
Figure 7. Edit
Layout Wizard for Grid Geometry.
Figure 8.
Edit Layout Wizard for Starting Data Rows.
Figure 9.
Edit Layout Wizard for Ratio or Intensity data.
Figure 10. Edit Layout
Wizard for optional (X,Y) spot coordinates available in the input
data.
Figure 11.
Edit Layout Wizard for optional Genomic ID values available in the
input data.
Figure 12. Edit
Layout Wizard for optional Gene Names available in the data.
Figure
13. Edit Layout Wizard for optional calibration DNA available in
the data and UniGene species prefix.
Figure 14. Edit
Layout Wizard for optional user names for Project, Database,
Sub-database, etc.
Figure 15.
Edit Layout Wizard for optional HP-X and HP-Y 'set' experimental class
(i.e. condition) names.
Figure 16.
Edit Layout Wizard for changing the default data filter threshold
slider values.
2.1 Specifying the mapping between your data file fields and those
required by MAExplorer
There are two special wizards for specifying the mapping array layout GIPO and
Quant input data field names. These mappings allow the converter
to take your data specified in some columns (i.e. Fields) of your data
input file and use it to generate standard MAExplorer output files.
Figure 17. shows the
Edit Layout Wizard for "Assign GIPO fields" used to generate the
MAExplorer GIPO data file.
Figure 18. shows the
Edit Layout Wizard for "Assign Quant fields" used to generate the
MAExplorer Quant files (one for each hybridized sample).
Figure 19.
shows saving modified Array Layout if you have made changes.
3. Generating the output data files for use with MAExplorer
Finally, the array layout has been defined and we can run the converter.
Figure 20. Selecting the output folder in which to save the
converted files.
Figure 21. Browse to select the output folder in which to save the
converted files.
Figure 22.
shows the interface after selection of the output file folder using a
file browser.
Figure
23. shows the conversion being performed after the user pressed
the RUN button.
Figure 24.
shows the conversion summary instructions after the conversion is
finished.
Figure
25. shows the files that are generated by Cvt2Mae for use by
MAExplorer.
Figure 26. Starting MAExplorer on the converted data by clicking
on Start.mae file. Note that the location of the "MAExplorer startup
file:" in Figure 8. Go to that file and click on it to start
MAExplorer. Alternatively, start MAExplorer and do "File | Open Disk
DB" and open that file to start it up.
B)
Mapping Array Layout GIPO and Quant input data field names
In addition to the above global defintions, additonal Array Layout
parameters need to be define. These are mapping of input file data
field names for GIPO and Quant data to the names required by
MAExplorer. There are two wizards for helping define these mappings.
For the predefined Array Layouts these are already setup but may need
to be defined or edited for user-defined data.
Downloads for The MicroArray Explorer Project
Types of download files available
You may download program installers for your particular computer for
both MAExplorer and Cvt2Mae. You may also download executable JAR
files for the MAEPlugins. There is a discussion of the program installer process for MAExplorer.
The same procedure is used for installing Cvt2Mae. If you are
interested in the source code, that is also available. Individual
files are available in the CVS directories listed in the table below
(see
instructions on using CVS to access these files directly with
CVS).
Gzipped tar archived packages of the source code are also
available on the SourceForge.net site.Table 1. Access of MAExplorer, MAEPlugins, Cvt2Mae from either Web
server
Program
Installer
VersionUpdate Program
VersionProgram
installersInformation on
installingSource
Jar file(s)
MAExplorer
0.96.34.01
0.96.34.01
MAExplorer
installing MAExplorer
source code
MAExplorer.jar
MAEPlugins
-
-
(not required)
Using MAEPlugins
source code
List of MAEPlugins
Cvt2Mae
0.73
0.73
Cvt2Mae
installing Cvt2Mae
source code
Cvt2Mae.jar
Mammary Genome Anatomy Program (MGAP) public data set
You may also download the Mammary Genome Anatomy Program (MGAP) public data set that can be used with
MAExplorer. There is a list of of PDF
documents describing MAExplorer and the Cvt2Mae data conversion
wizard that may be downloaded.
Table 2. Download the Mammary Genome Anatomy Program (MGAP) public
data set
The Mammary Genome Anatomy
Program (MGAP) using mouse models has available a of public data
set of 50 samples that may be downloaded and used with MAExplorer or
other types of analysis. The hybridized samples data consists of
tab-delimited files (no images) for about 1700 duplicate
spots/membrane. There is a list of startup .mae
files included in the download.
You may download it several different ways.
Download method
Web address
A single gzip file from SourceForge
SourceForge.net: MGAP-Array-database.tar.gz
As separate files
http://www.lecb.ncifcrf.gov/mae/MGAP-Array-database/
A single zip file
http://www.lecb.ncifcrf.gov/mae/MGAP-Array-database.zip
A single tar file
http://www.lecb.ncifcrf.gov/mae/MGAP-Array-database.tar
Upgrading the MAExplorer or Cvt2Mae JAR program files after the
initial installation
If you want to upgrade your installation to the latest JAR files,
simply download the JAR files and save them wherever you have
installed the programs replacing the previous jar files.
For example, in a typical Windows OS installation, the
MAExplorer.jar (or Cvt2Mae.jar) is installed in
C:\Program Files\MAExplorer\ (or C:\Program
Files\Cvt2Mae\) folder. Alternatively, you can update the Jar
file when running MAExplorer or Cvt2Mae as described in the next
paragraph.
Updating the MAExplorer or Cvt2Mae JAR files from the running programs
You can use the new "Update MAExplorer" command in the Files menu to
quickly download and install just the JAR file. This first prompt you
to verify that you want to update your program. Then it will (1)
backup the current MAExplorer.jar file as MAExplorer.jar.bkup; (2)
copy the latest MAExplorer.jar file from the
maexplorer.sourceforge.net Web site and replace your MAExplorer.jar
file in your installation directory. Then when you restart
MAExplorer, it will use the new version of the program. MAEPlugin JAR program files
A Plugins-jar.tar file
is available with all of the released MAEPlugin jar files. Simply
unpack the directory using Unix tar or a Windows unzip program and
copy the .jar files into a directory you can access when running
MAExplorer. To let MAExplorer go directly to these files when you do a
(Plugins | Load plugins) menu command, copy the .jar files into the
Plugins/ directory where you previously installed MAExplorer. For
example, in a typical Windows OS installation, this would be the
C:\Program Files\MAExplorer\Plugins\ folder.
Revision history of MAExplorer and Cvt2Mae
See the Revision
notes for more information on what changes have been made to
MAExplorer and Cvt2Mae and what new features are available or bugs
have been corrected.
Javadocs documentation views of the MAExplorer Project
View
Javadoc folder
Full javadocs (public+private) for MAExplorer
docsFull
Full javadocs (public only) for MAExplorer
docsAllPublic
Open Java API javadocs for MAExplorer
docsOJAPI
MaeJavaAPI (MJA) javadocs for MAExplorer
docsMJA
Full (public+private) javadocs for Cvt2Mae
javadocs
Mozilla Public License 1.1 (MPL 1.1)
1.0.1. "Commercial Use" means distribution or otherwise making
the Covered Code available to a third party.
2. Source Code License.
A. Any addition to or deletion from the contents of a file containing
Original Code or previous Modifications.
1.10. ''Original Code'' means Source Code of computer software code
which is described in the Source Code notice required by Exhibit A
as Original Code, and which, at the time of its release under this License
is not already Covered Code governed by this License.
2.1. The Initial Developer Grant.
The Initial Developer hereby grants You a world-wide, royalty-free,
non-exclusive license, subject to third party intellectual property claims:
(a) under intellectual property rights (other than
patent or trademark) Licensable by Initial Developer to use, reproduce,
modify, display, perform, sublicense and distribute the Original Code (or
portions thereof) with or without Modifications, and/or as part of a Larger
Work; and
2.2. Contributor Grant.
(c) the licenses granted in this Section 2.1(a) and (b) are effective
on the date Initial Developer first distributes Original Code under the
terms of this License.
Subject to third party intellectual property claims, each Contributor
hereby grants You a world-wide, royalty-free, non-exclusive license
(a) under intellectual property rights (other than
patent or trademark) Licensable by Contributor, to use, reproduce, modify,
display, perform, sublicense and distribute the Modifications created by
such Contributor (or portions thereof) either on an unmodified basis, with
other Modifications, as Covered Code and/or as part of a Larger Work; and
3. Distribution Obligations.
3.1. Application of License.
4. Inability to Comply Due to Statute or Regulation.
The Modifications which You create or to which You contribute are governed
by the terms of this License, including without limitation Section 2.2.
The Source Code version of Covered Code may be distributed only under the
terms of this License or a future version of this License released under
Section 6.1, and You must include a copy of this License with every
copy of the Source Code You distribute. You may not offer or impose any
terms on any Source Code version that alters or restricts the applicable
version of this License or the recipients' rights hereunder. However, You
may include an additional document offering the additional rights described
in Section 3.5.
Any Modification which You create or to which You contribute must be
made available in Source Code form under the terms of this License either
on the same media as an Executable version or via an accepted Electronic
Distribution Mechanism to anyone to whom you made an Executable version
available; and if made available via Electronic Distribution Mechanism,
must remain available for at least twelve (12) months after the date it
initially became available, or at least six (6) months after a subsequent
version of that particular Modification has been made available to such
recipients. You are responsible for ensuring that the Source Code version
remains available even if the Electronic Distribution Mechanism is maintained
by a third party.
You must cause all Covered Code to which You contribute to contain
a file documenting the changes You made to create that Covered Code and
the date of any change. You must include a prominent statement that the
Modification is derived, directly or indirectly, from Original Code provided
by the Initial Developer and including the name of the Initial Developer
in (a) the Source Code, and (b) in any notice in an Executable version
or related documentation in which You describe the origin or ownership
of the Covered Code.
(a) Third Party Claims.
(c)
Representations.
If Contributor has knowledge that a license under a third party's intellectual
property rights is required to exercise the rights granted by such Contributor
under Sections 2.1 or 2.2, Contributor must include a text file with the
Source Code distribution titled "LEGAL'' which describes the claim and
the party making the claim in sufficient detail that a recipient will know
whom to contact. If Contributor obtains such knowledge after the Modification
is made available as described in Section 3.2, Contributor shall promptly
modify the LEGAL file in all copies Contributor makes available thereafter
and shall take other steps (such as notifying appropriate mailing lists
or newsgroups) reasonably calculated to inform those who received the Covered
Code that new knowledge has been obtained.
If Contributor's Modifications include an application programming interface
and Contributor has knowledge of patent licenses which are reasonably necessary
to implement that API, Contributor must also include this information in
the LEGAL file.
Contributor represents that, except as disclosed pursuant to Section
3.4(a) above, Contributor believes that Contributor's Modifications are
Contributor's original creation(s) and/or Contributor has sufficient rights
to grant the rights conveyed by this License.
3.5. Required Notices.
You must duplicate the notice in Exhibit A in each file of the
Source Code. If it is not possible to put such notice in a particular
Source Code file due to its structure, then You must include such notice
in a location (such as a relevant directory) where a user would be likely
to look for such a notice. If You created one or more Modification(s)
You may add your name as a Contributor to the notice described in Exhibit
A. You must also duplicate this License in any documentation
for the Source Code where You describe recipients' rights or ownership
rights relating to Covered Code. You may choose to offer, and to
charge a fee for, warranty, support, indemnity or liability obligations
to one or more recipients of Covered Code. However, You may do so only
on Your own behalf, and not on behalf of the Initial Developer or any Contributor.
You must make it absolutely clear than any such warranty, support, indemnity
or liability obligation is offered by You alone, and You hereby agree to
indemnify the Initial Developer and every Contributor for any liability
incurred by the Initial Developer or such Contributor as a result of warranty,
support, indemnity or liability terms You offer.
You may distribute Covered Code in Executable form only if the requirements
of Section 3.1-3.5 have been met for that Covered Code, and if You
include a notice stating that the Source Code version of the Covered Code
is available under the terms of this License, including a description of
how and where You have fulfilled the obligations of Section 3.2.
The notice must be conspicuously included in any notice in an Executable
version, related documentation or collateral in which You describe recipients'
rights relating to the Covered Code. You may distribute the Executable
version of Covered Code or ownership rights under a license of Your choice,
which may contain terms different from this License, provided that You
are in compliance with the terms of this License and that the license for
the Executable version does not attempt to limit or alter the recipient's
rights in the Source Code version from the rights set forth in this License.
If You distribute the Executable version under a different license You
must make it absolutely clear that any terms which differ from this License
are offered by You alone, not by the Initial Developer or any Contributor.
You hereby agree to indemnify the Initial Developer and every Contributor
for any liability incurred by the Initial Developer or such Contributor
as a result of any such terms You offer.
You may create a Larger Work by combining Covered Code with other code
not governed by the terms of this License and distribute the Larger Work
as a single product. In such a case, You must make sure the requirements
of this License are fulfilled for the Covered Code.If it is impossible for You to comply with any of the terms of this
License with respect to some or all of the Covered Code due to statute,
judicial order, or regulation then You must: (a) comply with the terms
of this License to the maximum extent possible; and (b) describe the limitations
and the code they affect. Such description must be included in the LEGAL
file described in Section 3.4 and must be included with all distributions
of the Source Code. Except to the extent prohibited by statute or regulation,
such description must be sufficiently detailed for a recipient of ordinary
skill to be able to understand it.
5. Application of this License.
This License applies to code to which the Initial Developer has attached
the notice in Exhibit A and to related Covered Code.
6. Versions of the License.
6.1. New Versions.
7. DISCLAIMER OF WARRANTY.
Netscape Communications Corporation (''Netscape'') may publish revised
and/or new versions of the License from time to time. Each version will
be given a distinguishing version number.
Once Covered Code has been published under a particular version of
the License, You may always continue to use it under the terms of that
version. You may also choose to use such Covered Code under the terms of
any subsequent version of the License published by Netscape. No one other
than Netscape has the right to modify the terms applicable to Covered Code
created under this License.
If You create or use a modified version of this License (which you
may only do in order to apply it to code which is not already Covered Code
governed by this License), You must (a) rename Your license so that the
phrases ''Mozilla'', ''MOZILLAPL'', ''MOZPL'', ''Netscape'', "MPL", ''NPL''
or any confusingly similar phrase do not appear in your license (except
to note that your license differs from this License) and (b) otherwise
make it clear that Your version of the license contains terms which differ
from the Mozilla Public License and Netscape Public License. (Filling in
the name of the Initial Developer, Original Code or Contributor in the
notice described in Exhibit A shall not of themselves be deemed
to be modifications of this License.)COVERED CODE IS PROVIDED UNDER THIS LICENSE ON AN "AS IS'' BASIS, WITHOUT
WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, WITHOUT LIMITATION,
WARRANTIES THAT THE COVERED CODE IS FREE OF DEFECTS, MERCHANTABLE, FIT
FOR A PARTICULAR PURPOSE OR NON-INFRINGING. THE ENTIRE RISK AS TO THE QUALITY
AND PERFORMANCE OF THE COVERED CODE IS WITH YOU. SHOULD ANY COVERED CODE
PROVE DEFECTIVE IN ANY RESPECT, YOU (NOT THE INITIAL DEVELOPER OR ANY OTHER
CONTRIBUTOR) ASSUME THE COST OF ANY NECESSARY SERVICING, REPAIR OR CORRECTION.
THIS DISCLAIMER OF WARRANTY CONSTITUTES AN ESSENTIAL PART OF THIS LICENSE.
NO USE OF ANY COVERED CODE IS AUTHORIZED HEREUNDER EXCEPT UNDER THIS DISCLAIMER.
8. TERMINATION.
8.1. This License and the rights granted hereunder will
terminate automatically if You fail to comply with terms herein and fail
to cure such breach within 30 days of becoming aware of the breach. All
sublicenses to the Covered Code which are properly granted shall survive
any termination of this License. Provisions which, by their nature, must
remain in effect beyond the termination of this License shall survive.
9. LIMITATION OF LIABILITY.
UNDER NO CIRCUMSTANCES AND UNDER NO LEGAL THEORY, WHETHER TORT (INCLUDING
NEGLIGENCE), CONTRACT, OR OTHERWISE, SHALL YOU, THE INITIAL DEVELOPER,
ANY OTHER CONTRIBUTOR, OR ANY DISTRIBUTOR OF COVERED CODE, OR ANY SUPPLIER
OF ANY OF SUCH PARTIES, BE LIABLE TO ANY PERSON FOR ANY INDIRECT, SPECIAL,
INCIDENTAL, OR CONSEQUENTIAL DAMAGES OF ANY CHARACTER INCLUDING, WITHOUT
LIMITATION, DAMAGES FOR LOSS OF GOODWILL, WORK STOPPAGE, COMPUTER FAILURE
OR MALFUNCTION, OR ANY AND ALL OTHER COMMERCIAL DAMAGES OR LOSSES, EVEN
IF SUCH PARTY SHALL HAVE BEEN INFORMED OF THE POSSIBILITY OF SUCH DAMAGES.
THIS LIMITATION OF LIABILITY SHALL NOT APPLY TO LIABILITY FOR DEATH OR
PERSONAL INJURY RESULTING FROM SUCH PARTY'S NEGLIGENCE TO THE EXTENT APPLICABLE
LAW PROHIBITS SUCH LIMITATION. SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION
OR LIMITATION OF INCIDENTAL OR CONSEQUENTIAL DAMAGES, SO THIS EXCLUSION
AND LIMITATION MAY NOT APPLY TO YOU.
10. U.S. GOVERNMENT END USERS.
The Covered Code is a ''commercial item,'' as that term is defined
in 48 C.F.R. 2.101 (Oct. 1995), consisting of ''commercial computer software''
and ''commercial computer software documentation,'' as such terms are used
in 48 C.F.R. 12.212 (Sept. 1995). Consistent with 48 C.F.R. 12.212 and
48 C.F.R. 227.7202-1 through 227.7202-4 (June 1995), all U.S. Government
End Users acquire Covered Code with only those rights set forth herein.
11. MISCELLANEOUS.
This License represents the complete agreement concerning subject matter
hereof. If any provision of this License is held to be unenforceable, such
provision shall be reformed only to the extent necessary to make it enforceable.
This License shall be governed by California law provisions (except to
the extent applicable law, if any, provides otherwise), excluding its conflict-of-law
provisions. With respect to disputes in which at least one party is a citizen
of, or an entity chartered or registered to do business in the United States
of America, any litigation relating to this License shall be subject to
the jurisdiction of the Federal Courts of the Northern District of California,
with venue lying in Santa Clara County, California, with the losing party
responsible for costs, including without limitation, court costs and reasonable
attorneys' fees and expenses. The application of the United Nations Convention
on Contracts for the International Sale of Goods is expressly excluded.
Any law or regulation which provides that the language of a contract shall
be construed against the drafter shall not apply to this License.
12. RESPONSIBILITY FOR CLAIMS.
As between Initial Developer and the Contributors, each party is responsible
for claims and damages arising, directly or indirectly, out of its utilization
of rights under this License and You agree to work with Initial Developer
and Contributors to distribute such responsibility on an equitable basis.
Nothing herein is intended or shall be deemed to constitute any admission
of liability.
13. MULTIPLE-LICENSED CODE.
Initial Developer may designate portions of the Covered Code as “Multiple-Licensed”.
“Multiple-Licensed” means that the Initial Developer permits you to utilize
portions of the Covered Code under Your choice of the NPL or the alternative
licenses, if any, specified by the Initial Developer in the file described
in Exhibit A.
EXHIBIT A -Mozilla Public License.
``The contents of this file are subject to the Mozilla Public License
Version 1.1 (the "License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.mozilla.org/MPL/
ANY KIND, either express or implied. See the License for the specific
language governing rights and
limitations under the License.
______________________ are Copyright (C) ______ _______________________.
All Rights
Reserved.
LEGAL file - MAExplorer Software under the Mozilla Public License (V 1.1)
MAExplorer Software under the Mozilla Public License (version 1.1)
Date: 5-31-2002
List of Figures
All figures are available in low (default) and high resolution. Click
on the image and it will bring up the high resolution version. List of Tables
Glossary of terms used in MAExplorer