|
||
Newsletters | Plugins | Quick start | Short tutorial | Advanced Tutorial | Glossary | Figures | Tables | Index | Help desk |
||
|
++ Note: This hypertext manual is divided into chapters and appendices Web pages. These may be printed individually from your Web browser by (1) clicking in the text window to be printed, and (2) using the "Print Frame" in Netscape or "Print" in Internet Explorer. Some of the chapters (eg. 2) have many images. The entire manual may be downloaded at one time with low resolution figures and is suitable for printing in the Web browser. You may also download a an Adobe acrobate PDF file version of the entire manual with the lower resolution figures (~5Mb). The Unix script for creating the full reference manual from the individual HTML pages is CreateMaeFullRefManual.do. |
The MAExplorer is a Java-based bioinformatics exploratory data-analysis and data-mining program for analyzing sets of quantitative spotted cDNA or oligonucleotide microarray data (Lemkin et al., 2000) - (see (Schulze, 2001) for a review of microarray technology).
Prior to its release on SourceForge, MAExplorer was developed by Dr. Peter Lemkin (LECB/NCI-Frederick) with help from Gregory Thornwall (SAIC) and Jai Evans (DECA/CIT, NIH). It was initially created for analyzing 33P labeled membrane array data from the mouse mammary tissue from Mammary Genome Anatomy Project (MGAP) http://mammary.nih.gov/ with the help of many researchers in the Laboratory of Genetics and Physiology, NIDDK under Dr. Lothar Hennighausen. Since the early work with MGAP it was extended to work with other types of cDNA and oligo arrays and various nucleotide labeling methods. These include spotted Cy3/Cy5 glass slides, spotted membranes, non-geometric chip data, and other chip supports with different geometries and numbers of duplicate spots/gene, clones as well as oligo chip data such as Affymetrix. A wizard tool called Cvt2Mae was developed to make it easier for other researchers to convert their data to the format required by MAExplorer. Cvt2Mae was developed by Peter Lemkin, Greg Thornwall and Bob Stephens (ABCC/SAIC). You may extend the set of builtin analysis methods by writing Java plugins called MAEPlugins.
This document describes the MAExplorer's functionality, provides tutorials and contains documentation for using it with various types arrays.
With this program, you may: 1) analyze expression of individual genes; 2) analyze expression of gene families and clusters; 3) compare expression patterns for multiple hybridized samples.
MAExplorer is written in Java and runs as a stand-alone application that you download to your computer. Although MAExplorer began out as a Java applet for use with with Web browsers for the MGAP Web database ( http://www.lecb.ncifcrf.gov/mae ), we have depricated its use as an applet because of many problems with running large Java applets in some Web browsers. Instead, we recommend downloading MAExplorer which includes the public MGAP array data as a demonstration data set. Then run MAExplorer on this data after you have installed it on your computer.
Notation: MAExplorer uses the notation that the sample
probe total mRNA is labeled and then hybridized against the
known cDNA targets tethered to the microarray. Because of this
notation, we refer to a hybridized sample as a HP. An alternative
notation that reverses these terms is also commonly used (see
"Chipping Forecast", Nature Genetics supplement, Jan, 1999,
pg 1). Also, because arrays may be constructed from either spotted
clones or oligonucleotides, we refer to hybridized chip DNA from any
of these sources genericlly as "genes".
|
Throughout this document we use the abbreviations HP for hybridized sample, GC for gene class. These and other terms are explained in the Glossary and Index . There are a number of figures and tables illustrating various features of MAExplorer throughout this manual. Figures are presented at low-resolution. By clicking on the lower-resolution figure, the high-resolution versions can be viewed.
NOTES: because MAExplorer is under development, there may be occasional problems with some of its functionality. There may also be some problems (mostly bad HTML links) with migrating from LECB/NCI to the SourceForge Web site. Some operations that are under development are labeled with "[Future]" in this manual. We welcome your suggestions for improvements as well as letting us know about problems that you encounter. Occasionally the manual or the figures in the manual may not be quite in phase with the software. Please notify us of problems or suggestions by E-mail so we can try to fix or implement them. If you are a bioinformatics developer and would be interested on working with the MAExplorer project, consider joining the MAExplorer development team on SourceForge.net. |
1. Introduction
1.1 Microarrays and notation used with MAExplorer
1.2 Microarray image quantification
1.2.1
Ratio and Zscore comparison of data from different hybridized samples
1.3 Microarray image and plot display
1.4 Exploratory data analysis - overview
1.4.1
Saving the state of a data-mining session in stand-alone mode
1.4.2
Logging messages and command history
1.5 Quick start - demonstration of MAExplorer
1.6 Tutorials for using MAExplorer
2. MAExplorer menus
2.1 File menu
2.1.1
Databases menu
2.1.2
Exploratory state menu
2.1.3
Groupware facility for sharing user states menu
2.2 Samples menu
2.2.1
Selecting sample HP with chooser or menu sample lists
2.2.2
Swapping selected samples's (Cy3,Cy5) channels in ratio data
dye-swap experiments
2.2.3
Viewing sample HP-X, HP-Y, and HP-E partitions
2.2.4
Defining sample condition 'class' names
2.2.5
Toggling between single HP-X (-Y) samples and HP-X (-Y) sets
2.3 Edit menu
2.3.1
User edited gene list - the 'Edited Gene List' menu
2.3.2
Sets of genes menu
2.3.3
Sets of Sample Conditions menu
2.3.4
Setting user preferences menu
2.4 Analysis
2.4.1 GeneClass menu
2.4.1.1
GeneClass ontology subsets
2.4.1.2
Simulating Gene Class ontologies using Gene Set operations
2.4.2 Normalization menu
2.4.2.1
Intensity background correction
2.4.2.2
Normalization between microarrays to allow comparison
2.4.2.3
Using different normalizations to 'see' different data views
2.4.3 Filter menu
2.4.3.1
Data filtering using multiple gene data filters
2.4.4 Plot menu
2.4.4.1
Show microarray pseudoarray images menu
2.4.4.2
Scatter plots menu
2.4.4.3
Histogram plots menu
2.4.4.4
Expression profile plots menu
2.4.5
Cluster menu
2.4.5.1
Cluster genes with expression profiles similar to current gene
2.4.5.2
Cluster counts of similar filtered genes by expression profiles
2.4.5.3
K-means clustering' gene expression profiles for filtered genes
2.4.5.4
Hierarchical clustering of expression profiles
2.4.6 Report menu
2.4.6.1
Array report menu - hybridized samples global data
2.4.6.2
Gene reports menu
2.4.6.3
Table format menu
2.4.6.4
Table font size menu
2.5 View menu
2.5.1
Logging MAExplorer messages
2.5.2
Logging command history
2.6 Plugins menu
2.7 Help menu
3.
Exploratory Data Analysis - Data Mining
3.1 Analysis objectives
3.1.1
Some experimental design issues of microarray experiments
3.1.2
Design philosophy of MAExplorer methodology
3.1.3
Evolution of MAExplorer from earlier proteomic data mining systems
3.1.4
Concepts used in data mining with MAExplorer
3.2
Steps in an analysis
3.2.1
Definition of expression profile
3.2.2
Clustering Methods
3.2.2.1
Clustering similar genes
3.2.2.2
K-means clustering
3.2.2.3 Hierarchical clustering
3.3
Display gene intensity and identification data measurements
3.4
Selecting subsets of genes using the data Filter
3.5
Selecting subsets of hybridized sample conditions
3.6
Setting threshold values using the state-scroller sliders
3.7
Exporting report and plot data
4. Status and Bugs of MAExplorer
4.1 Known Bugs in MAExplorer
4.1.1 Browser Applet Bugs
4.1.2 Downloading and Installer Bugs
4.1.3 Computation speed and display Bugs
4.1.4 User state and login Status
4.1.5 Data file names Bug
4.1.6 Gene Sets Bugs
4.1.7 Clustering Bugs
4.1.8 Expression profile Bugs
4.1.9 Data conversion problems
4.1.10 Java Plugins bugs
4.2 Revision notes
4.3
Web Browser problems when running MAExplorer as an applet
4.4
Handling fatal error reporting (i.e. DRYROT errors)
References to related exploratory data analysis methods
R.1 Nucleic Acids Res. paper (PDF)
R.2 Overview (PDF)
R.3 Examples (PDF)
R.4 Using mAdb data with MAExplorer (PDF)
R.5 Introduction to Data Mining with MAExplorer(PDF) or
(PPT)
R.6 Using Cvt2Mae to convert array data for use with MAExplorer.(PDF)
R.7 Statistics in Functional Genomics workshop paper (PDF)
R.8 Software design of the MAExplorer data mining tool
(PDF) or
(PPT)
Appendices
A. Short tutorial for MAExplorer
A.1 Demonstration data
A.2 General instructions
A.3 Self-guided tutorial of MAExplorer - notation and examples
C. Use of MAExplorer with user's microarray data
C.1
Creating quantified spot data files from hybridized sample arrays
C.2
Table of samples that can be loaded into MAExplorer
C.3
Quantified spot data file format
C.4
GIPO table database file format
C.5
Configuring MAExplorer for use with other arrays
C.6
Using the Cvt2Mae 'wizard' tool to convert array data for use with
MAExplorer
D.
Use of MAExplorer as a stand-alone application
D.1
Installing MAExplorer as stand-alone application
D.2
Downloading MAExplorer for stand-alone use with other arrays
D.3
Starting MAExplorer by clicking on a .mae file
D.4
The data file format for .mae files
D.5
Using MAExplorer as an Applet on your computer
D.6
List of startup .mae files included in the download installation
E. Design issues
E.1
Internal data structures design to facilitate direct manipulation
E.2
Approaches to data mining: client-centric and server-centric models
E.3
Conversion of microarray data files to MAExplorer format using Cvt2Mae
E.4
Extending MAExplorer functionality using Java Plugins
E.5
Web database server design
Download Installers
Installer information
MAExplorer Open Source
Download source
javadocs for source
MPL1.1 Public License
Legal
List of Figures
List of Tables
Glossary of terms used in MAExplorer
Index
Figure 1. Overview of MAExplorer exploratory data analysis system. Initial data preparation steps are performed prior to analysis by MAExplorer and are indicated by cyan italics at the top of the figure. The primary data consists of quantified microarray image data as well as corresponding qualitative clone ID, gene-in-plate-order (GIPO or print-table, etc.), gene name, hypertext base references and related information. After the microarrays are hybridized, they are scanned and spots quantified using image spot quantification programs. These lists are then saved for each array in a tab-delimited file. Microarray image quantification may be performed by various software such as Axon's GenePix(TM), Scanalyze, Molecular Dynamics ImageQuant(TM), Research Genetics' Pathways(TM), etc. When used as a stand-alone application, data may be saved on the local computer for local off-line use, and direct access to other Internet genomic databases may be made without using a proxy server.
[DEPRICATED: When used as an applet, this auxiliary databases and the MAExplorer Jar files are copied to the Web server or local file system (in the case of the stand-alone version) where they are then available to be downloaded by users. When a user invokes a Web page containing the Java applet, it first downloads the applet that then downloads auxiliary databases including a configuration file that describes the array data. It then downloads the subset of quantified microarray spot data files requested for the set of hybridized samples being investigated. Additional samples may be downloaded at any time. When the user selects an operation that requires access to Web databases not residing on the MAExplorer Web server, implicit Java security restrictions prevent the applet from going directly to these other Web servers. Instead, it requests the MAExplorer proxy server request the data from the foreign Web server, and then returns it back to the user's Web browser. ]
Figure 1.1.1 Overview of MAExplorer exploratory data analysis system. MAExplorer is used as a stand-alone application on local data. [Its use as a Web browser applet has been DEPRICATED. In the case of the applet, it may only access quantified array data from the Web server that launched the applet.]
Figure 1.1.2 Overview of data preparation for quantified spot data used by MAExplorer. MAExplorer handles quantified spot data as shown in this figure. Arrays are hybridized against labeled samples are scanned and spots are quantified into spot data files. Quantified spot data is represented as tab-delimited data with data for one spot/row. Each spot is identified in this file by its grid coordinates (grid, grid row, grid column) with image (X,Y) coordinates being optional. Quantified spot data includes the raw spot intensity for each channel (in the case of multiple channels such as Cy3, Cy5, etc.). If the original data has background spot intensity values, then that may be included as well - otherwise no background data will be available for background correction. The spot data is discussed in more detail in Section 1.1 and Appendix C.1, and Appendix C.3.
Figure 1.1.3 Overview of running MAExplorer as a stand-alone application. The preferred way of running MAExplorer is as a stand-alone application. There are distinct advantages in running MAExplorer as an application in that data and the exploration state may be saved on the users local computer, direct access to genomic servers is easier (no proxy server required - see Figure 1.4). MAExplorer plugin extensions (MAEPlugins) may only be used with the stand-alone version. Since MAExplorer is packaged for download for a variety of operating systems, using this method is not difficult to set up and the MAEPlugins should run on a variety of operating systems.
Figure 1.1.4 [DEPRICATED] Overview of running MAExplorer as a Web browser applet. An alternative way of running MAExplorer on existing databases is as a Web-browser applet. There advantage of this method is that no software installation is required on the user's computer. However, the user may not save data and the exploration state on their local computer. Furthermore, direct access to genomic servers requires a proxy server. MAExplorer plugin extensions (MAEPlugins) may not be used with the the applet version. The Mammary Genome Anatomy Program (MGAP) originally used the MAExplorer applet.
Virgin = ( V.1, V.2, V.3 ) Pregnacy = ( P13.1, P13.2, P13.3 ) Lactation = ( L3.1, L3.2, L3.3 ) Involution = ( I4.1, I4.2, I4.3 )
Partuition= ( Virgin, Pregnacy, Lactation, Involution )
In MAExplorer we refer to grids by letter names (A,B,C,...) and fields by F1 and F2. If you are using Cy3/Cy5 ratio data and the Cy3 and Cy5 data is available as independent channels for each HP sample, then operations that use F1 and F2 will use the Cy3 and Cy5 data for various operations such as scatter plots (Cy3 vs Cy5), etc. If there is only one field in an array (i.e. no duplicate grids), then when MAExplorer is run, operations and menus describing F1 and F2 operations will not be available.
Using duplicate (F1 and F2) spots allows us to get an estimate of the hybridization variance within an array and is used to compute the (F1,F2) gene coefficient of variation (CV) used in the gene data Filter to remove noisy data before looking for additional differences. Note that if Cy3/Cy5 data is used, then F1 and F2 duplicates are not allowed as MAExplorer uses the (F1,F2) data to hold the(Cy3,Cy5) data for a hybridized sample.
Example: special array spot coordinate numbering for the MGAP arrayAs an example of this coordinate system, the following describes the array geometry for the array used in the NIDDK MGAP database. The general principal with different sizes and numbers of fields is the same for other arrays. The MGAP array was spotted by Research Genetics for MGAP. Clones in the array are laid down in grids consists of 8 rows and 24 columns per grid. There are 8 grids (named A through H or 1 to 8) to a field with a space between grids. Finally, there are two fields (left and right named 1 and 2 or F1 and F2) that are duplicates.Note: we currently present the MGAP arrays with grids A through H oriented from top to bottom - whereas Research Genetics orients them rotated +90 degrees with grid H to the left and grid A to the right. This occurred when the images were scanned with a -90 degree change in the orientation. Therefore, we have swapped rows and columns in our relative orientations so it meets with users normal expectations of row-column orientation. This could be easily changed to the Research Genetics convention using a parameter in the configuration file. Since the actual plate coordinates are tracked with each clone and reported when it is accessed in MAExplorer, the image coordinate system is not that critical - although the verisimilitude of actual array layout and the data-mining layout can be useful.
|
Various gene identifiers may be present in the GIPO data file associated with the array. One of these is selected to as a unique identifier to represent genes in the MAExplorer database. Normally, the Master gene ID is defined as the Clone ID. However if the Clone ID is not present, but the GenBank ID is, it will use the latter as the identifier. If neither GenBank nor Clone ID is present, it will use GenBank5' then GenBank3' if present. If that is not present, it will use the UniGene ID if is present. If that is not present, it will use dbEST5' then dbEST3' if present. If that is not present, it will use LocusLink LocusID if present. Finally, if none of those identifiers are present, you can specify a 'Generic ID' that is related to some other database gene identifier such as a 'Location' identifier.
The current gene may be specified by clicking on a spot in the microarray image or on a point in the popup scatter plot, or a gene ID cell in a report.
ratio(x,y,c) = Ixc / Iyc where: samples x,y have values Ixc and Iyc for the same gene c in samples HP-X and HP-YThe Zscore method transforms the data such that it can not be used with the ratio comparison. Instead we use the Zdiff(x,y) method for comparing Zscore developed by Mark Vawter (Vawter, 2000). Zscores typically cover the range of -3.0 to +3.0 (standard deviations) with a transformed mean of 0.0. Therefore the Zdiff will typically cover the range of -6.0 to +6.0.
Let Zscore(p,c) = (Ipc - meanp)/stdDevp where: Ipc is the intensity of gene c for sample p. Sample p has meanp and stdDevp Then, Zdiff(x,y,c) = Zscore(x,c) - Zscore(y,c), where: samples x,y have Zscore(x,c) and Zscore(y,c) normalized values for the same gene c in samples HP-X and HP-Y, or HP-X 'sets' and HP-Y 'sets'.
|
The Filter menu is used to select a set of data filters that determines which genes are selected. These are highlighted in the array image in different ways - with a red (white) circle in the intensity (ratio) pseudoarray image each spot meeting the range threshold criteria. How these are highlighted depends on which Plot menu Show Microarray method and View menu modes were selected. If the Show 'Edited Gene List' (EGL) option is set in the View menu, genes in the EGL will appear as magenta squares. The "Filter mode" is always present and shows genes meeting various Filter criteria (to be discussed). The user may interactively define a list of genes by clicking on them when the Click to add gene to edited gene list option is set in the Edit menu. Alternatively, you can click on a gene with the Control key pressed to add a gene to the EGL or with the Shift key pressed to delete a gene from the EGL.
In all of the pseudoarray images, the grids in the image are labeled field#-GridLetter (e.g. 1-C, 2-B, etc). This allows them to be clearly identified as the user scrolls over the image that is larger than the visible computer window.
There is also a popup alert message window for bettering informing users of conditions that prevent them from doing the operation they requrested. You must press the Close button to pop-down the message, although you may do press the SaveAs butto to save the message to a file. For complex problems, some of the messages may suggest what you need to do to correct the problem.
Hybridized samples are selected from a list of all of the sample samples in the database. To make it easier to select a HP, they may be selected from submenus by their developmental stage (if supported by your particular database) or from a list of all samples in the database located on the left side of the pseudoarray image. If a sample has never been loaded during a session, it will be loaded when you request it.
The last sample selected is called the current sample or current HP. That is the sample that is displayed in the pseudoarray image in the primary MAExplorer window when using display modes requiring a single sample.
Figure 1.3 Data Filter Venn diagram. This illustrates some of the logical, data range and statistical tests criteria available using the MAExplorer data Filter paradigm. Note that multiple criteria may be selected from each of these categories. The extreme case, probably never used, could use all tests.
A first-approximation approach to data-mining might be to sequentially constrain the data of interest to find some changes and then to report on those changes. We have arranged these commonly performed first-pass operations as submenu entries in the Analysis Menu. The submenus are:
Figure 1.4 Screen view of MAExplorer main window with Analysis Menu. The menu structure of MAExplorer was designed to allow users to quickly perform commonly used data-mining operations. Other menus are used for modifying the data (File, Samples, Edit, and View menus) or accessing on-line Help menu information in a separate Web browser popup window. MAExplorer menus are similar to most Windows PC applications where pull-down menu selections are used to invoke operations. The current hybridized array sample is displayed as a pseudocolor ratio image of median normalized spot intensities. Clicking on a spot assigns it as the current gene with data being reported in the top most message area. The names of the current HP-X and HP-Y samples are listed above that area. In general, clicking on spots, points in plots or cells in spreadsheet reports will assign the it as the current gene and access Web genomic databases if enabled.
In addition to displaying the hybridized sample pseudoarray images, derived data may be viewed in various types of plots. These include scatter plots, histograms, ratio-histograms, expression profiles, gene clustering, etc. Data may be presented as table reports presented as either active spreadsheets that can access genomic databases by clicking on cells or as tab-delimited Excel-compatible tables that may be cut (if your windowing system supports this) and pasted into an Excel spreadsheet.
The selected HP-X and HP-Y samples are used when generating scatter plots, ratio histograms and other graphics. Scatter plots and ratio histograms may also be performed on the left and right sides of the currently displayed HP array (fields F1 and F2 respectively if array data has duplicate spots for the same genes).
A MAExplorer database contains a table identifying genes, so data is accessible by gene name as well or by sub-strings identifying a set of genes (e.g. "onco" that could be used to find any oncogene or proto-onco gene in the database).
When the program starts, it displays the microarray image of the first hybridized sample in the HP-X set of samples initially specified. If you specify a new HP-X or HP-Y sample, then it changes the pseudoarray image to correspond to that array. You may change the current HP-X or HP-Y sample from either the Samples pull-down menu or by clicking on a sample in the Active Sample list in the left of the pseudoarray image. If you click the mouse on or near a spot, it will latch onto that spot and define it as the current gene.
Note: In Figure 1.4, genes that pass the MAExplorer data Filters are indicated by red (white) circles around spots in the pseudograyscale (pseudocolor) intensity (ratio) image. The pseudoarray image shows the gene data as replicate grids of spots if there are two fields Field 1 (left set of grided spots) and Field 2 (right set of grided spots). If there is no duplicate spot data, then only Field 1 is shown.
If background correction is enabled in the Normalization menu, then intensity is reported in the message displays as intensity' otherwise as intensity. Normalization should also be used between hybridized samples - whether the data is ratio data (i.e. Cy3/Cy5) or single sample intensity arrays.
Setting up MAExplorer to work with user-specific data is discussed later in this manual in Appendix C.
Figure 1.5.1 The MicroArray Explorer home page at http://maexplorer.sourceforge.net/. The table of contents in the left panel lists an introduction and short tutorial, several demonstration databases. Below that are links to documentation including this reference manual, glossary and index. The Export version discusses running MAExplorer with other arrays and as a stand-alone version. The Download application is a Web page for downloading and installing the stand-alone Java application on your computer.
You may start MAExplorer in your Web browser from the MGAP Startup DB. This offers several preset public databases consisting of sets of hybridized samples as well as the empty database. After you have clicked on a particular startup database, it will begin loading MAExplorer - indicated by a red box with a "Loading..." message in the top window of your browser. After MAExplorer starts, this message changes to a white box with "Reading DB" while it downloads the data files required. Finally, when it is ready for your interaction, it displays a white box with a green "Ready".
NOTE: for Web browser invocation, the MAExplorer applet works with Netscape 4.7, Internet Explorer 5.0, and HotJava on a Windows (95/98/NT/2000/XP) system or a Solaris Unix system. Macintosh and SGI systems seem to hang at times because of Web browser problems. However, it works on all other systems as a stand-alone Java application that you may download and install on your computer. You might want to review these Web browser restrictions.
After the MAExplorer is started and the menus become active, you may switch the preset hybridized samples to other samples using the Samples pull-down menu. The last hybridized sample loaded becomes the "current hybridized sample" and its image is the one displayed.
The following Sections 2.1 through 2.7 describe the pull-down menus in detail.
In stand-alone mode, the user may select the database subset to be loaded from either a Web server or a local file system. When used as an applet, this is pre-determined by the Web page where MAExplorer is started. Opening a disk DB, 'Open disk DB', also restores any user defined gene sets and other parts of the exploratory state that were present when the 'Save ... disk DB' was invoked.
In the following menus, selections that are sub-menus are
indicated by a ''. Selections prefaced with a '
' and indicate '
' indicate that the command is a checkbox
that is enabled and disabled respectively. Checkbox menu items
have a "[CB]" at the end of the command. Selections prefaced with
a '
' and indicate '
' indicate that the command is a
multiple choice "radio button" that is enabled and disabled
respectively, and that only one member of the group is allowed to be
on at a time. Radio button menu items have a "[RB]" at the end of
the command. Selections prefaced with a '#' indicate that
the commands are available only when MAExplorer is run in the
stand-alone mode. Selections prefaced with a '*' commands
requires access to the backend Web server [Future]. Selections that
are not currently available will be grayed out in the menus of the
running program.
When used as an applet connected to a Web database server, databases may be divided into public and collaborator projects. Users accessing protected collaborator projects will be required to log-in to the server and a popup login request will appear.
[In the future], each user will be able to save the state of their exploration into a password protected directory of named states on a Web server (e.g. doing a 'Save ... Web DB' command. Later, they could restore that state from the Web server by doing an 'Open Web DB' command). Users would be required to register with that server to set up a unique state-saving area. Once this facility was setup, users may selectively allow other user's to view selected data implementing a groupware environment for improving collaboration.
Figure 2.1.1 Example of the "Open file DB" command. The file browser is opened in the current project directory with the name of the currently opened file. You may select another .mae startup database file to load in the current project. You may also "cruise" the file system and load an .mae file from a different project directory. The "Set project" command makes this easier since it gives you a list of available projects that you may change directly. The projects must have been setup on your computer previously. The "New project" command can be used for setting up new projects or projects.
Figure 2.1.2 Example of saving a user session in a new startup file using the "SaveAs DB" command. The file browser is opened in the current project directory with the name of the currently opened file. You may enter another .mae file name to save your current session. Then when you restart MAExplorer using this new file, it will restore the data mining state to where you left off (except that no popup windows are opened).
A registered user may allow another registered user to access their state or states (using the Open another user's state command) if the user owning the data had granted them permission. The Share user state and Unshare user state commands control these permissions. There are two special share-users defined: public to allow unlimited read-only access to the state they specify, and private to disallow all access to a user state.
The first menu command, "Choose HP-X, HP-Y and HP-E samples", entries lets you change the current working HP-X 'set', HP-Y 'set', and HP-E 'list' hybridized samples.
The fourth menu command, "Set Samples from lists", lets you change the current HP-X and HP-Y, HP-Y samples as well as the HP-X 'set', HP-Y 'set', and HP-E 'list' samples. This is similar to using the "Choose HP-X, HP-Y and HP-E samples" command, but is more dificult to use. You may change the current HP-X or HP-Y sample by clicking on the sample name directly in the list of sample names on the left side of the pseudoarray image (see Figure 2.2.3 legend).
The fifth menu entry, "Edit use (Cy5/Cy3) else (Cy3/Cy5) for each HP", lets you swap data channels for Cy3/Cy5 data for individual samples.
Other menu commands list the status of the current HP-X 'set', HP-Y 'set', or HP-E 'list', and define condition class names that are associated with the HP-X 'set' and HP-Y 'set'. The last menu entry, "Use HP-X & HP-Y 'sets' else single samples", lets you switch between using HP-X and HP-Y as single samples of sets of multiple samples. For example, if you are using a scatter plot of X and Y, it will switch the data being plotted from a comparison of single samples to a comparison of means of sets of samples depending on the status of the switch. Sets of samples are used extensively in data explorations.
Figure 2.2.1 Samples menu - selecting lists of samples by using the "chooser". The hybridized samples assigned to the current HP-X, current HP-Y, set of HP-X, set of HP-Y and expression profile list HP-E may be changed from the Samples pull down menu using the Choose HP-X, HP-Y and HP-E option lets you graphically change the currently active sample HP-X, HP-Y sets and E-list.
Figure 2.2.2 Samples menu - selecting samples by source characteristics. The hybridized samples assigned to the current HP-X, current HP-Y, set of HP-X, set of HP-Y and expression profile list HP-E may be changed from the Samples pull down menu. The specific "By Source" menus shown here are from the MGAP database. This figure shows the user changing the current X sample from the developmental stages submenu that is part of the "By Source" submenu. Alternatively, samples containing a keyword or part of a keyword can be found using a "guesser" popup window that allows the use of wild cards. This is invoked using the "From list of all H.P.s" submenu. For example, you could specify "*pregnancy*" to find all samples of containing that word.
Figure 2.2.3 Changing the current sample to either the HP-X or HP-Y sample by clicking on a sample name at the left edge in the microarray pseudoarray image. The current sample is indicated in magenta. Click on the magenta "*" adjacent to the new name you want to select and it will change the HP-X sample. To switch between setting HP-X and HP-Y, click on the [X] Current Sample box to change the sample to HP-Y. You can click on [Y] Current Sample box to change it back to HP-X. Then clicking on a sample name will set it to the current HP-X or HP-Y that was selected. This figure shows that the user had selected [Y] and C57B6-L10-29hrs for the new HP-Y sample.
The Set current HP-X sample and Set current HP-Y sample commands offer another way to set the single current X and Y sample (see Figure 2.2.3 above for the preferred way using the "Chooser").
The Edit HP-X & HP-Y 'sets' of samples by source menu allows the user to define HP-X and HP-Y as sets having multiple hybridized samples. Then, the mean values of the genes are used when comparing HP-X with HP-Y.
For example, the By Source database-specific entries for the MGAP database includes the following submenus.
The From list of all samples selection pops up a hybridized sample guesser dialog window. As with the gene name guesser, you can start typing in the name of a sample and it will give you a list of HPs that match that initial string. You then click on the sample you want and then press the Done button.
Figure 2.2.4 Samples menu - selectively swapping (Cy3,Cy5) data channels for particular samples. This is only operative if your database contains Cy3/Cy5 ratio labeling data. This is useful in databases containing subsets of dye-swap experiments mixed in with other samples that are not dye-swapped.
Figure 2.2.5 shows a screen illustrating a popup condition chooser session. The set of all samples in the database is in the scrollable "Remainder Samples" window in the upper left. The samples you have selected for the condition list being edited is shown in the upper right "Selected Samples in current condition" window. The list of all conditions in the database is in the lower left "List of Conditions" window. The current condition list that is selected is highlighted and its contents displayed in the "Selected Samples" window. User defined annotation associated with the current condition are displayed in the right "Current Conditioned Annotation" window. To add a new condition, click on the Add Cond button to define the new condition name. The Remove Cond button is used to delete a named condition list. The List Cond button pops up a report listing the samples and annotation for the current condition. The List All button pops up a report listing the the names of all of the conditions and the annotation names. You may add or remove new annotation names for all of the conditions. The Add Ann button will add the new annotation you enter into all conditions - you must enter the data for each condition that requires it. You may The Save the current status of all of the conditions into your working database. If you have pressed Cancel before saving, then you will not have saved your edits. Pressing the Done button will save the changes and pop-down the window.
Figure 2.2.6 shows a screen illustrating a popup ordered condition list (OCL) chooser session. The set of all conditions in the database is in the scrollable "Remainder Conditions" window in the upper left. The conditions you have selected for the OCL being edited is shown in the upper right "Selected Conditions in current OCL" window. The list of all conditions in the database is in the lower left "List of Conditions" window. The current OCL list that is selected is highlighted and its contents displayed in the "Selected Conditions" window. User defined annotation associated with the current OCL are displayed in the right "Current OCL Annotation" window. To add a new OCL, click on the Add OCL button to define the new condition name. The Remove OCL button is used to delete a named condition list. The List OCL button pops up a report listing the conditions and annotation for the current OCL. The List All button pops up a report listing the the names of all of the OCLs and the annotation names. You may add or remove new annotation names for all of the OCLs. The Add Ann button will add the new annotation you enter into all conditions - you must enter the data for each condition that requires it. You may The Save the current status of all of the OCLs into your working database. If you have pressed Cancel before saving, then you will not have saved your edits. Pressing the Done button will save the changes and pop-down the window.
Sets of genes or HP condition lists are very useful for tracking
complex data-mining sequences of analysis. For example, derived named
gene sets may be used in successive data filters and for reports. For
example, one could do the following experiment given four different
types of HPs for (e.g. virgin, pregnancy, lactation, and
involution)
First compare two HPs using a statistical test such as a t-test. Then save the resulting set of genes under the name "virgin vs. pregnancy". Then compare the next two HPs and save the resulting genes under the name "lactation vs. involution". Finally, compute the difference of genes found in "virgin vs. pregnancy" that are not found in "lactation vs. involution". This resulting gene set could then be saved (e.g. with a name "Genes found in virgin vs. pregnancy, but not in lactation vs. involution"). Similarly, taking the intersection of these two named sets shows genes that are common between the two sets. Taking the union shows genes found in either of the two named sets. |
The Edit menu contains the following main selections. All of these entities and preferences are saved as part of the startup state when you do a (File | Databases | SaveAs ... DB).
Figure 2.3.1 Edited Gene List defined from the Gene Name Guesser using wildcards. The Edited Gene List was defined as the set of genes containing the sub-string "onco" in it. The sub-string was specified to the popup guesser window as "*onco*" using '*' characters as wildcard symbols indicating that it should match any or no characters. The button Gene Name may be toggled through a set of other identifiers including Clone ID, UniGene ID, dbEST 3', dbEST 5', GenBank 3', and GenBank 5', LocusID, etc. depending on what identifiers are available in your database. The user then pressed the Set E.G.L. button on the guesser window that sets the E.G.L. to those genes. If you have enabled the View menu "Show 'edited gene list', then the genes in the EGL. are viewed as magenta squares seen in the pseudoarray image. You many to do additional editing to manually add or remove genes that you want to change in the set. If a 2D scatter plot was being used, EGL labeled genes would appear there as well. To select a particular gene as the current gene, click on the gene you want in the list, then press the Done button.
The following is an example of List saved gene sets state listing the catalog of named gene subsets in some of the MGAP data. Note that sets #1 to #11 are fixed by the data in the GIPO file and may not be changed by the user. Sets #12 to #14 are assignable from other sets or in the case of the E.G.L, by various MAExplorer operations. Sets #1 through #14 may not be removed whereas #15 and higher may be removed.
User Gene Sets Set# |#genes| title ======================= #1 |1727| ALL GENES #2 |394| ALL NAMED GENES #3 |246| ESTs similar to genes #4 |456| ESTs #5 |1096| All genes and ESTs #6 |1681| Good genes #7 |40| Replicate genes #8 |0| HousekeepingGenes #9 |96| Calibration DNA #10 |77| Your plates #11 |46| Empty wells --------- User Assignable ---------- #12 |0| User Filter Gene Set #13 |60| Edited Gene List #14 |0| Normalization Gene Set --------- User definable------------ #15 |60| The 60 genes closest to Carbonic Anhydrase-III #16 |30| Named genes in the 60 genes closest to CA-III #17 |4| Replicate genes in the 60 genes closes to CA-III
The following figure illustrates selecting sets by name for gene set operations.
Figure 2.3.2 Selection of gene sets for binary gene set operations. This example computes the Boolean AND of two sets "ALL NAMED GENES" and "60 genes closest to CA-III from Named and Ests", and then the AND of the "Replicates" with the previous result. The first result is save in the set called "The 60 genes closest to Carbonic Anhydrase-III". The second result is saved in the called set "Named genes in the 60 genes closest to CA-III". Finally, the third result is saved in the set named "Replicate genes in the 60 genes closes to CA-III".
The following is an example of List saved HP condition lists state listing the catalog of named HP condition lists.
Condition Lists =============== Condition[1] #HPs 2, [Initial HP-X: C57B6 pregnancy day 13] Condition[2] #HPs 2, [Initial HP-Y: Stat5a (-,-) pregnancy day 13] Condition[3] #HPs 4, [Initial HP-E expression list]
The following is an example of List contents of saved HP condition list state.
Condition List #1 [Initial HP-X: C57B6 pregnancy day 13] ==================================== HP[1] Pregnancy 13 (1 hr) [C57B6-p13-totalRNA5ug] HP[2] Pregnancy 13 (1 hr) [C57B6-p13.2poly-A]
Figure 2.3.4.1 Popup window allowing you to adjust all threshold slider values">. The Adjust all Filter threshold scrollers command allows you to pre-adjust all threshold slider values used in data filtering and in clustering. It may be easier to set the approximate range before invoking the clustering operation because changing a parameter will recluster your data.
The Define HP-X (HP-Y) class name command may be used to change the names of the HP-X (HP-Y) experimental condition sets. These names are used in various labels in the main window, popup plots and reports, etc. The commands to change various names of database components are in the Preferences submenu in the Edit menu.
Figure 2.4 MAExplorer main window with Analysis Menu. The menu structure of MAExplorer was designed to allow users to quickly perform commonly used data-mining operations as a first approximation analysis.
Figure 2.4.1.1 Gene Class menu. The user may select a subset of genes that belong to one of the classes of genes. This shows the user selecting the set of "All named genes" that are indicated with red (white) circle over the spots in the array intensity (ratio) pseudoarray image.
Figure 2.4.1.2 Example of all replicated genes occurring more than once in the array. This was selected by using the GeneClass 'Replicate genes'. You may use the data Filter "Filter by genes with replicates" instead of the GeneClass. This has the advantage that you may use other GeneClasses (e.g. ESTs, or All named genes, etc.). Alternatively, you can find all of the replicates for a particular gene by 1) use the Gene Guesser to find the particular gene you want; 2) press "Set E.G.L." to save it as an Edited Gene List; 3) enable the Filter "Filter by E.G.L." at the same. This will show all occurrences of that gene.
Some of the above gene classes are deduced from the gene name supplied with the Gene In Plate Order (GIPO) file for the array. We use the following automatic classification rules shown in Table 2.4.1.
Table 2.4.1 Rules for the automatic classification of gene names into the default Gene Class sets. The gene name is analyzed alphabetic-case independently.
Gene class | Rule for class membership |
---|---|
All genes | all genes on the array |
All named | genes not starting with "EST" |
ESTs similar to genes | genes starting with "EST," |
ESTs | genes with the name "EST" |
Replicate genes | genes with multiple copies |
Calibration DNA | genes using the configuration file name "calibDNAname" (optional - see Appendix Table C.4.1 ) |
Your plates | clones using the configuration file name "yourPlates" (optional - see Appendix Table C.5.1-C)) |
Empty Wells | empty wells where no spot exists on the array indicated by keywords "empty", "empty well" or "EmptyWell" (optional - see Appendix Table C.5.1-C) ) |
Good Genes | spots on the array where the GIPO QualCheck data was used and was valid. If it was not used, then it assumes all spots are good. (optional - see Appendix Table C.4.1 ) |
|
Note: although this set of normalization methods is limited, it is adequate for some analyses of the data. We are in the process of adding more normalization methods through MAEPlugin methods. |
Some software quantification software (e.g. Research Genetics' Pathways 2.01) measures background globally as: BGLow (low background), BGAvg (Average background), BGRms (root mean square background). For MGAP, MAExplorer uses the BGLow value when you request background subtraction. These values are read from the MAExplorer Samples DB file (see Appendix Table C.2.1.1 For other quantification programs, background may be available on a per-spot basis in the quantification files. It the latter is available in your data, it will be used if background correction is enabled (see Appendix C.3).
The background corrected intensity I'ij is computed from the raw intensity Iij and background intensity bkgrdHPi for H.P. i and spot j as follows:
I'ij = Ij - bkgrdHPi
(Cy3hj - BkgrdCy3hj) / (Cy5hj - BkgrdCy5hj)
Zscoreij = (Iij - mnIi)/sdIi, and Zdiffj(x,y) = Zscorexj - Zscoreyj.
Imij = (Iij/ medianIi)
Imij = log(1.0 + (Iij/ medianIi))
ZlogSij = (log(Iij) - mnLIi)/sdLIi
ZlogAij = (log(Iij) - mnLIi)/madLIi
Igsi = Sum (Iij) genes j i in HPiThen, the normalized intensity I'ij is computed as:
I'ij = Iij/Igsi
Figure 2.4.2.3 Scatter plot of HP-X and HP-Y 'sets' data. HP-X is C57B6 pregnancy day 13 and HP-Y is Stat5a (-,-) pregnancy day 13 filtered by "All named genes and ESTs". A) A scatter plot using the Median normalization. B) A scatter plot using the Zscore of the logs normalization. Notice how the Casein alpha outlier is more apparent in the case of the Zscore log normalization. The skewed plot is characteristic of much microarray data. Some normalization methods (not currently included in MAExplorer) can compensate for these some of these artifacts (Dutoit, 2000) and are planned for future MAEPlugins.
Figure 2.4.3 Filter menu. The Filter menu is a cascade of data filters that restrict the set of genes passing all filters that have been enabled and whatever the criteria was that was set for those filters. This figure shows the GeneClass filter set to "All genes and ESTs", the spot CV filter and Ratio (X/Y) range filters being set interactively by the scroll bars on the right. The genes that pass the filter are indicated with a red (white) circle in the array intensity (ratio) pseudoarray image.
The Filter menu options are used to restrict the set of genes by pre-filtering the data with a series of cascaded filter criteria and tests. The resulting subset of genes passing the filter are then used in the plots, reports and other data analysis methods. Some of the filters require additional parameters that are set by the State scrollers. The user will automatically be prompted for changes to these scollers (a threshold scrollers window will pop up) when the filter is activated or change. These values may also be set from the Adjust all Filter threshold scrollers entry in the Preferences submenu in the Edit menu. The filters are broken up into subgroups in the following menu with the grouping haveing more to do with the criteria (i.e. gene set membership, data range, or statistical tests).
The Filter by Good Spot data submenu filter contains options that specify spots based on their quality. It filters out genes that have that do not have "Good Spot" values defined by the optional QualCheck spot data. (See the list of codes in Appendix C.4). If there is no such spot quality data, then all spots are considered "good". The filter is enabled by setting the "Filter by spots with Good Spot values" checkbox. All spots for the specified samples must meet the criteria. In the "Check spots for Good Spot mode" submenu, you may set the samples where the test may be applied to spots from the current HP, the single (HP-X,HP-Y) samples, (HP-X,HP-Y) 'sets' (replicated spots), or samples in the HP-E list selected to be used in the filter.
CVj = 2|f1j-f2j|/(f1j+f2j)If the database only has one field but replicate HPs, then you may use the HP-X & HP-Y 'sets' CVj to filter the genes. Then CVj values are tested against a CV threshold slider value to eliminate genes with a high coefficient of variation.
Figure 2.4.3.1 Filtering using multiple scrollers. This example is of Cy3/Cy5 time series data. It filters normalized spot intensity of the Cy3 and Cy5 channels independently ([SI1:SI2] inside range) where low intensity spots are eliminated. It then filters out genes outside of the [R1:R2] ratio range.
Figure 2.4.3.2 Using the Positive Intensity data Filter. This allows removing negative data if the data contains negative intensity values (e.g. Some Affymetrix data has negative Average Difference values which could be read as Intensity for MAExplorer).
You may switch between different representations of the microarray spot pseudoarray image. It may be viewed as several different types of pseudo images including an intensity gray value and a pseudo-color Red/Black/Green image for ratio (HP-X/HP-Y) and Zscore (HP-X - HP-Y) data. The p-Value results of comparing a HP-X 'set' with a HP-Y 'set' of samples, or the CV of the HP-EP 'list can be displayed as a color spectrum pseudoarray image.
Depending on the origin of the array data, it may have the same verisimilitude as the original arrays. Otherwise, it is displayed in a generic pseudoarray image containing grids that will fit the window - these are not the same as the original array image (see . However, the pseudoarrays are useful to getting a rough idea of the global changes in the data between arrays and how may genes pass the data filter.
When enabled using one of the commands in the Section 2.4.5 Clustering menu, cluster data appears as blue circles or squares drawn as overlays on the pseudoarray image. These options are discussed in the section on clustering. If you are doing clustering K-means clustering, the current cluster is displayed in the scatter plot if the latter is active.
Scatter plots, ratio and intensity histograms of the mean (HP-X/HP-Y) or (HP-X/HP-Y) 'set' data, or the F1/F2 or Cy3/Cy5 data. F1/F2 or Cy3/Cy5 plots are available if the data exists in your particular database. That might be the case with replicate spots or with Cy3/Cy5 data. If the normalization is set to a Zscore or log mean mode, it will compute Zscore scatter plots and histograms.
Clicking on spots in an array image or points in scatter plots sets the current gene and will bring up data on the gene or (optionally) access corresponding data from GenBank, UniGene, mAdb Clone, etc. databases in a popup Web browser. Clicking on a bin in a ratio or intensity histogram plot filters out all genes except for those in the range of that bin.
Expression profiles plots of selected genes or subsets of genes for all samples in the HP-E list. These are active plots with data reported when the user clicks in the plot.
Clicking on a spot (i.e. gene) in the microarray pseudo image or on a point (i.e. gene) in the scatter plot, it will define that gene as the "current gene" that is used in other operations. The current gene is indicated in both plots with a green circle around it. Similarly, you may modify the 'Edited Gene List' from either the pseudoarray image or the scatter plot. When viewing is enabled, it overlays those genes with magenta squares.
Figure 2.4.4 Plot menu - selecting Ratio Pseudoarray image. This displays a pseudocolor show in the scale on the left that indicates the ratio of the value of the HP-X sample / HP-Y sample (or 'sets' if the option to use HP-X and HP-Y 'sets' is enabled.) If The data is Cy3/Cy5 data, then this displays the ratio of the ratios using the current normalization. Various other pseudoarray image representations could be used.
Table 2.4.4.1. Pseudocolors assigned to spots to represent data in the X/Y ratios or X-Y Zdiffs pseudocolor array images. Each color represents the normalized X/Y ratio or X-Y Zdiff depending on Normalization mode. The 9 colors of the boxes represent the normalized expression ranges.
. | . | . | . | . | . | . | . | . | Normalization mode - RBG |
bright green | . | . | dark green | Black | dark red | . | . | bright red | |
. | . | . | . | . | . | . | . | . | Normalization mode - dichromasy |
bright blue | . | . | dark blue | Black | dark orange | . | . | bright orange | |
<0.250X | 0.307X | 0.400X | 0.571X | 1.000X | 1.75X | 2.50X | 3.25X | >4.00X | Ratio data |
<-3.0 | -2.25 | -1.50 | -0.75 | 0.00 | 0.75 | 1.50 | 2.75 | >3.0 | Zscore data |
<-0.99 | -0.742 | -0.495 | -0.247 | 0.000 | 0.247 | 0.495 | 0.742 | >0.99 | Zscore Log data |
Clicking on a particular gene will report its specific quantification and identification values (See Section 3.3 on gene quantification). If the Enable display current gene in popup genomic DB Web Browser option is set in the View menu, then it will also pop up a Web browser with the corresponding to the particular genomic DB data for that database if it exists.
The same data is shown in a variety of normalization and display formats.
Figure 2.4.4.1.1.1 Pseudoarray intensity image of median normalized intensities of the current HP sample (C57B6 virgin 10 weeks from MGAP database). The graylevel scale on the left edge of the pseudoarray image indicates the spot intensity. All pseudoarray images have scales that vary depending on the type of pseudoarray being displayed.
Figure 2.4.4.1.1.2 Pseudoarray intensity image of Zscore normalized intensities of the current HP (C57B6 virgin 10 weeks from MGAP database).
Figure 2.4.4.1.1.3 Pseudoarray intensity image of ZscoreLog normalized intensities of the current HP (C57B6 virgin 10 weeks from MGAP database).
Figure 2.4.4.1.1.4 Pseudoarray intensity image of ZscoreLog normalized intensities of the dual HP-X and HY-Y individual samples. The Plot menu Show Microarray submenu toggle "Use dual HP-X & HP-Y samples" option is set. HP-X is a C57B6 pregnancy day 13 and HP-Y is a Stat5a (-,-) pregnancy day 13.
Figure 2.4.4.1.1.5 Pseudoarray intensity image of ZscoreLog normalized intensities of the dual HP-X and HY-Y sample 'sets'. The Plot menu Show Microarray submenu toggle "Use dual HP-X & HP-Y samples" option is set. The "Use HP-X & HP-Y 'sets' option in the Samples menu. HP-X is the mean of three 'C57B6 pregnancy day 13' and HP-Y is the mean of three 'Stat5a (-,-) pregnancy day 13'.
Figure 2.4.4.1.2.1 Pseudocolor array image of median normalized X/Y ratios. HP-X is C57B6 pregnancy day 13 and HP-Y is Stat5a (-,-) pregnancy day 13. Each spot's color represents the normalized X/Y ratio depending on Normalization mode. The color of the box is one of 9 colors representing the normalized expression ranges and assigned according to the table "Ratio mode".
Figure 2.4.4.1.2.2 Pseudoarray color image of normalized X/Y 'set' mean value ratios. Mean of three HP-X C57B5 pregnancy day 13 samples and mean of three HP-Y Stat5a (-,-) pregnancy day 13 samples. Each spot's color represents the normalized X/Y 'set' ratios depending on Normalization mode. The color of the box is one of 9 colors representing the normalized expression ranges and assigned according to the table "Ratio mode".
Figure 2.4.4.1.2.3 Pseudoarray color image of X-Y Zdiffs. HP-X is C57B6 pregnancy day 13 and HP-Y is Stat5a (-,-) pregnancy day 13. Each spot's color represents the normalized X-Y Zdiff depending using the Zdiff normalization mode. The color of the box is one of 9 colors representing the normalized expression ranges and assigned according to the table "Zdiff mode".
Figure 2.4.4.1.2.4 Pseudoarray color image of X-Y Zdiff of log data. HP-X C57B5 pregnancy day 13 sample and HP-Y Stat5a (-,-) pregnancy day 13 sample. Each spot's color represents the normalized X/Y ratio depending on ZdiffLog with StdDev normalization mode. The color of the box is one of 9 colors representing the normalized expression ranges and assigned according to the table "ZdiffLog mode".
Figure 2.4.4.1.2.5 Pseudoarray image showing color-coded p-values for t-test comparison of HP-X and HP-Y 'set' samples. The HP-X and HP-Y sets both have 2 samples each (more is obviously much better). The data was normalized using the Median and a spot intensity [SI1:SI2] data filter was applied to eliminate some of the noisy data. Each spot's color represents a p-value in the range indicated in the scale in the left edge of the image. Note that although all spots are assigned a p-Value, many may not be very significant because adequate preprocessing of the data (such as normalization, and low intensity spot removal, etc.). So use this display with care.
rSq=0.974, n=1728, X(mn+-sd)=(4.477+-7.845), Y(mn+-sd)=(12.379+-24.810)The Scatter plots submenu includes:
If Cy3/Cy5 ratio data is being analyzed, then the "HP F1 vs F2 intensity" menu entry becomes
The following figure illustrates some of the scatter plots and zoomed regions using the scroll bars on the horizontal and vertical axes.
Figure 2.4.4.2 Scatter plot of HP-X and HP-Y single sample data. HP-X is C57B6 pregnancy day 13 and HP-Y is pactation day 1. A) An active scatter plot may be generated for the current HP-X and HP-Y samples filtered by "All named genes". B) similar plot for HP-X and HP-Y 'sets' of replicate samples (3 pregnancy and 4 lactation samples in the sets respectively). Clicking on a point in the plot sets the current gene. C) Zoomed up region (of B) at the bottom of the plot showing more detail and filtered by just "All named genes". Zooming is performed by adjusting the X or Y axes limits scroll bars. Note the points enclosed in magenta boxes indicate genes in the E.G.L. gene list.
Figure 2.4.4.2.1 Scatter plot of multiple channel data from a single sample. A) F1 Vs F2 data for a C57B6 pregnancy day 13 sample. B) Cy3 vs Cy5 data for a NCI mAdb mouse array sample. C) Scatter plot of individual Cy3 channels from two different ratio Cy3/Cy5 data hybridized samples. C) Scatter plot of individual Cy3 channel of HP-X compared with Cy3 channel of HP-Y for ratio Cy3/Cy5 data hybridized samples. D) Scatter plot of individual Cy3 channel of HP-X compared with Cy5 channel of HP-Y for ratio Cy3/Cy5 data hybridized samples.
The Intensity selection plots a histogram of the gene intensity data values for each Filtered spot (gene) in the current hybridized sample.
The Histograms submenu includes:
If Cy3/Cy5 ratio data is being analyzed, then the F1F2 histogram menu entry becomes
The following figures illustrates the histograms. You may use the histogram to specify ranges of [I1:I2] or {R1:R2] for data filtering in the histogram by specifying the corresponding histogram bins. This is described in the figure legend.
Figure 2.4.4.3 Histogram plots. A) Ratio histogram of HP-X/HP-Y data with particular histogram bin selected with the constraint set to filter all genes > that bin. HP-X is 13 day pregnancy C57B6 and HP-Y is day 1 lactatation. The selected bin thresholds are then used in the Filter with the resulting Filtered genes shown in the array image. B) Zdiff histogram of HP-X - HP-Y 'sets' for same data as (A) but with the >< threshold constraint set to find genes outside of the symmetric histogram range. C) Intensity histogram of HP-X data filtered by [I1:I2] intensity range. As with ratio histograms, you can do additional filtering by selecting a particular histogram bin that is then used in the Filter. Filtering was disabled for the intensity histogram. To apply the filter, the "Don't re-Filter" button would be toggled to the "Re-Filter" state. The threshold constraints include: =, >, <, >, <>, and ><. Note that each time you click on the "Thr:" button, it cycles to the next option in the threshold constraints list.
You many generate as many individual expression profile plots as you want using the Display a gene's expr. profile for HP-E command. However, only the last one will be active and will be updated with different genes as you click on them in the microarray image scatter plot. This could be used to compare the EP plots for several different genes. First view the EP plot for one gene, then create a new EP plot for the second gene, etc.
If you use the Display Filtered genes expr. profiles for HP-E command, it will generate a scrollable list of expression profile plots for all of the genes passing the Filter. If the number of genes is very large, it may take a while.
You may interrogate a line corresponding to a particular HP sample in a EP plot by moving the mouse over the line and then selecting the line. This will cause the name of the HP, its intensity and CV to appear in the plot. If the Err check box is set, then the mean of the intensity is indicated by a short horizontal bar and the +- CV by red vertical error bars above and below the mean. If the plot style Line button is pressed, then the plot style is cycled between Line (vertical lines for each point), Circle (small circles at each point), and Curve (circles are connected). Pressing the button repeatedly cycles through: Line (i.e. vertical vars), Circle, or Curve (i.e. continuous curve of all samples). In the case of mean expression profiles used in K-means clustering, the standard deviation is used in place of the CV value. The various clustering methods have EP plots buttons. When they are invoked, the scrollable list of EP plots is sorted by the clustering method ordered list of genes. This enables you to view the data in the same order as that produced by the cluster analysis. If the zoom nnX button is pressed, then all of the plots are magnified by nn-fold to make low intensity plots more visible. Pressing the button repeatedly cycles through: 1X, 2X, 5X, 10X and 20X. It does not change the data itself. The Show HP names button pops up a numbered list of all HP entries used in the expression profile. If you are in stand-alone mode, a SaveAs GIF button will also be available for the EP overlay mode (Figure 2.4.4.4.1) or individual EP plot. This saves the current plot as a full resolution GIF file specified by the user in a popup file browser window.
The Expression profile plots submenu contains:
Figure 2.4.4.4 Expression profile plots. A) Individual expression profile plots may be created by clicking on any gene. Multiple instances may also be created. Here we show some of the presentation options for the 38 sample MGAP database. Error bars are computed for the standard error for that sample. There are three different plotting options: line, circle and curve. #1 is the default line plot with error bars. #2 is the line plot without the error bars but clicking on line 7 to find out which sample it is and what the intensity value is. #3 is the circle plot with error bars, and #4 is the curve plot without error bars. Window #5 shows the list of samples corresponding to the 38 points in the EP plots. B) List of EPplots of the oncogenes and proto-oncogenes in the database (set by the guesser with "onco" and "Set E.G.L." and the Edited Gene List Filter). The list would become scrollable if there were more than 10 profiles. Setting the current gene would scroll the list to the EPplot for the current gene.
Figure 2.4.4.4.1 Expression profile plots. A) Scrollable list of EP plots of Filtered named genes centered at Carbonic anhydrase III. B) Overlay plot of all named Filtered genes. C) Overlay plot of all ONCO or PROTO-ONCO genes with the draw EGL option active so the graphs are drawn for these genes.
When enabled, cluster data appears as blue circles or squares drawn as overlays on the pseudoarray image. These options are discussed in the section on clustering.
Cluster analysis plots include finding a subset of genes or subsets of samples based on cluster analysis of expression profile similarity measures. These show genes belonging to particular clusters, or genes that cluster well with specified genes. Cluster methods include: finding genes similar to the current selected gene within a "distance" threshold; K-means-like clustering where you specify a seed gene and the number of clusters; and hierarchical clustering with clustergram and dendrogram graphics.
Figure 2.4.5 Cluster Menu options. The hierarchical clustering option is being selected.
There are many methods for doing clustering - each with advantages and disadvantages. We present three methods in MAExplorer and plan on adding a variety of more powerful methods through the MAEPlugin facility under development.
These methods may find genes belonging to particular clusters or genes that cluster well with particular genes. Gene clusters are sets of genes whose expression profiles are found to be similar according to a particular metric. We now define what we mean by "similar". The order list of hybridized samples used in computing the expression profiles are those in the HP-E list. MAExplorer has two different dissimilarity measures for Cij: Euclidean distance LSQdistij and Pearson correlation coefficient rij. These are computed as follows and are tested against the cluster distance threshold (set by the slider in the preferences sliders). Let n= |HP-E|, the number of samples in the expression profile. We define similarity as (1.0 - normalized dissimilarity).
Hint: when working with very large data sets with many samples, it may be useful to pre-adjust the distance and/or number of clusters threshold sliders to an approximate range using the (Edit Menu | Preferences | Adjust all Filter threshold scrollers). This is because once the clustering starts, it does not (currently) let you abort the clustering to change the threshold value. |
LSQdistij = Sqrt( Sum ( D'hj - D'hi) **2 ) / n h in HP-E i,j in Filtered genes, i not j
Let, sumij = Sum( D'hj * D'hi ), mni = (1/n)Sum( D'hi ), mnj = (1/n)Sum( D'hj ), sumSqi = Sum( D'hi * D'hi ), sumSqj = Sum( D'hj * D'hj ), then, [sumij - n*(mni * mnj)] rij = -------------------------------------------------------- [Sqrt(sumSqi - n*n*mni*mni) * Sqrt(sumSqj - n*n*mnj*mnj)] h in HP-E i,j in Filtered genes, i not j
The Cluster plots submenu contains a number of clustering methods. Pressing the Escape key during a long cluster operation will abort the operation. If you are in stand-alone mode using the ClusterGram, a SaveAs GIF button will also be available for saving the current plot as a full resolution GIF file specified by the user in a popup file browser window.:
The Hierarchical Cluster plots submenu contains:
Figure 2.4.5.1 Similar genes clustered to the current gene. This method finds all genes that are similar to the current gene as those defined by their distance between expression profiles being less than the threshold set by the user. Each gene that passes the cluster distance threshold test is indicated in the image with a blue square where the size of the square is proportional to its similarity. This data is from the 38 samples in the MGAP database containing duplicated spots. A) Main windows with popup cluster similarity report and cluster distance threshold slider. B) Scrollable list of EPplots of similar genes with the red error bars indicating the variation for duplicated spots for each HP sample. The Err checkbox may turn the error bar overlays on and off.
For both of these commands, if you want to view the expression profile plots, click on the EP plot button in the cluster window and it pops up the scrollable expression profiles window. If you click on a gene in the image, it will select it as the new current gene and seed gene and recompute the cluster of genes most similar to the new see gene.
For both of these commands, if you want a permanent report, click on the "Cluster Report" button in the cluster window and it will generate a report in the current modality (i.e. scrollable spreadsheet or tab-delimited). You may switch between these two modes by pressing the "Go '...'" button in the report.
Figure 2.4.5.2 Display of cluster counts for all genes less than the cluster threshold from MGAP 38 sample database. The algorithm counts the number of similar genes for each Filtered gene and draws a blue circle whose size is proportional to the number of genes similar to that gene. That is why there are a larger number of the larger circles.
Figure 2.4.5.3 Genes clustered using the K-means cluster method. A) Using the current gene as the initial cluster, MAExplorer finds N orthogonal clusters assigning the set of filtered genes to these clusters using the HP-E expression profiles. All genes are iteratively assigned to these clusters. Genes belonging to the current cluster are labeled with a green cluster number both in the array and in the scatter plot. The slider determines the number of clusters (set to 6 here). A 2D scatter plot shows the genes belonging to cluster 6. The K-means cluster report on the right contains a sorted list of the genes in each cluster and has buttons to generate EP plots and reports as well as summary mean EP plots (shown) and mean cluster reports. The detailed list is shown below. B) Part of the scrollable EP plots for this data showing genes belonging to both clusters #5 and #6. C) The mean EP plots for the 6 clusters.
Cluster report for 6 K-means clusters with 141 genes being clustered. The seed gene is [1248564] Jun-B oncogene. Clone ID Similarity Cluster-# Distance-to-cluster Gene-Name -------- -------------- --------- ------------------- ---------------- 1248411 ************** 1 Cluster [26 genes] in cluster [distNext: 1.035] wiCdist:mn+-sd=1.223+-0.453 CV=0.371 Calpactin I light chain 1381592 ********** 1 0.448 Surfeit gene 4 1247956 ********* 1 0.706 Protein kinase, cAMP dependent, catalytic, beta 1381836 ******** 1 0.761 Prohibitin 1382325 ******** 1 0.771 M.musculus mRNA for C1D protein 1248270 ******** 1 0.775 Seven in absentia 1A 1247716 ******** 1 0.794 Lipoprotein lipase 1248184 ******** 1 0.847 Mus musculus bromodomain-containing protein BP75 mRNA, complete cds 1248564 ******* 1 0.864 Jun-B oncogene 1382667 ******* 1 0.888 SERINE/THREONINE PROTEIN PHOSPHATASE PP2A-BETA, CATALYTIC SUBUNIT 1382561 ******* 1 0.931 Mus musculus GTP-specific succinyl-CoA synthetase beta subunit (Scs) mRNA, partial cds 1248089 ****** 1 1.013 M.musculus RPS3a gene 1247780 ****** 1 1.088 Proprotein convertase subtilisin/kexin type 7 1247557 ****** 1 1.104 M.musculus L28 mRNA for ribosomal protein L28 1248321 ***** 1 1.278 Decay accelerating factor 1 1382751 **** 1 1.311 Clusterin 1382007 **** 1 1.357 Murine mRNA with homology to yeast L29 ribosomal protein gene 1382074 **** 1 1.390 Orosomucoid 1 1381963 **** 1 1.417 M.musculus mRNA for ribosomal protein L36 1248278 ** 1 1.658 HISTONE H3.3 1247630 ** 1 1.675 Procollagen, type I, alpha 2 1247865 * 1 1.837 Mouse beta-D-galactosidase fusion protein mRNA, complete cds 1382236 * 1 1.85 Caspase 7 1247833 1 1.882 Mus musculus radio-resistance/chemo-resistance/cell cycle checkpoint control protein (Rad9) mRNA, complete cds 1248535 1 1.953 M.musculus mRNA for selenoprotein P 1247702 1 2.157 Cytochrome C oxidase, subunit Va 1382282 ************** 2 Cluster [13 genes] in cluster [distNext: 24.199] wiCdist:mn+-sd=16.184+-6.667 CV=0.412 Max interacting protein 1 1382159 ********** 2 9.086 TRANSPLANTATION ANTIGEN P35B 1247854 ********* 2 11.002 Prolyl 4-hydroxylase, beta polypeptide 1247970 ******** 2 11.786 Mouse mRNA for osteoblast specific factor 2 (OSF-2) 1381663 ******** 2 12.948 Mus musculus vacuolar adenosine triphosphatase subunit A gene, complete cds 1382100 ******** 2 13.34 T-complex protein 1, related sequence 1 1248366 ******** 2 13.541 Mus musculus cytochrome c oxidase subunit VIIa-L precursor (Cox7al) mRNA, nuclear gene encoding mitochondrial protein, complete cds 1247568 ******** 2 13.762 Cathepsin D 1247872 ******* 2 14.015 Mus musculus endothelial monocyte-activating polypeptide I mRNA, complete cds 1382333 ******* 2 14.065 Stromal cell derived factor 5 1382008 ******* 2 15.985 Mus musculus FK-506 binding protein homolog (SAM11) mRNA, complete cds 1247724 **** 2 21.964 Glutathione-S-transferase, alpha 3 1247846 2 34.704 House mouse; Musculus domesticus kidney mRNA for Phosphatidic acid phosphatase, complete cds 1247945 ************** 3 Cluster [22 genes] in cluster [distNext: 11.979] wiCdist:mn+-sd=7.559+-3.347 CV=0.443 Mus musculus mRNA for DEDD protein 1247797 ********** 3 4.159 Mus musculus Btk locus, alpha-D-galactosidase A (Ags), ribosomal protein (L44L), and Bruton's tyrosine kinase (Btk) genes, complete cds 1382087 ********** 3 4.494 Cell division cycle 42 1247539 ********** 3 4.511 EST 1248212 ********** 3 5.009 Murine mRNA for integrin beta subunit 1248470 ********** 3 5.044 EST 1247521 ********* 3 5.299 Mus musculus mRNA for peroxisomal integral membrane protein PMP34 1381808 ********* 3 5.924 Mus musculus UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase-T3 mRNA, complete cds 1381970 ********* 3 6.285 Mus musculus thioredoxin mRNA, nuclear gene encoding mitochondrial protein, complete cds 1382168 ********* 3 6.343 N-terminal Asn amidase 1382704 ********* 3 6.36 Mus musculus N-myristoyltransferase 1 mRNA, complete cds 1248548 ********* 3 6.378 Mus musculus WDR protein mRNA, complete cds 1247564 ******** 3 6.652 Erythrocyte protein band 7.2 1248588 ******** 3 6.67 M.musculus BAP31 mRNA 1247541 ******** 3 6.690 Apolipoprotein D 1248462 ******** 3 7.322 Sterol O-acyltransferase 1 1248462 ******** 3 7.42 Sterol O-acyltransferase 1 1248521 ****** 3 9.121 Mus domesticus nuclear binding factor NF2d9 mRNA, complete cds 1382212 ****** 3 10.137 Thyroid autoantigen 70 kDa 1382270 ***** 3 10.529 Voltage-dependent anion channel 2 1248152 ***** 3 10.541 M. musculus mRNA for MAP kinase-activated protein kinase 2 1247678 3 19.431 Casein alpha 1247543 ************** 4 Cluster [44 genes] in cluster [distNext: 1.035] wiCdist:mn+-sd=0.439+-0.266 CV=0.606 RAS-related C3 botulinum substrate 1 1381923 ************ 4 0.158 Prolyl 4-hydroxylase, beta polypeptide 1382052 ************ 4 0.209 Trans-acting transcription factor 1 1247882 *********** 4 0.237 Mus musculus AMP activated protein kinase mRNA, complete cds 1248099 *********** 4 0.246 Mus musculus mitogen-responsive 96 kDa phosphoprotein p96 mRNA, alternatively spliced p67 mRNA, and alternatively spliced p93 mRNA, complete cds 1248351 *********** 4 0.251 Abl-interactor 1 1247540 *********** 4 0.255 Mus musculus mRNA for ZIP-kinase, complete cds 1248316 *********** 4 0.26 Mus musculus proteasome alpha7/C8 subunit mRNA, complete cds 1382671 *********** 4 0.264 Mouse MA-3 (apoptosis-related gene) mRNA, complete cds 1382014 *********** 4 0.277 Transcription elongation factor B (SIII), polypeptide 1 (15 kDa),-like 1247885 *********** 4 0.289 Mus musculus mRNA for ryudocan core protein, complete cds 1248294 *********** 4 0.292 Mus musculus thioredoxin-related protein mRNA, complete cds 1382066 *********** 4 0.306 Inhibitor of DNA binding 2 1248597 *********** 4 0.307 Lipocortin 1 1248591 *********** 4 0.324 Interferon beta, fibroblast 1248445 ********** 4 0.333 Mus musculus beta prime coatomer protein mRNA, partial cds 1247775 ********** 4 0.34 House mouse; Musculus domesticus male brain mRNA for ARF1, complete cds 1382750 ********** 4 0.340 Thymoma viral proto-oncogene 1247905 ********** 4 0.341 Monokine induced by gamma interferon 1381668 ********** 4 0.351 Mus musculus mitogen-activated protein kinase-activated protein kinase mRNA, complete cds 1381811 ********** 4 0.356 Protein tyrosine phosphatase, receptor type, D 1382031 ********** 4 0.358 Protease (prosome, macropain) 28 subunit, beta 1248345 ********** 4 0.363 Mus musculus alpha-methylacyl-CoA racemase mRNA, complete cds 1382555 ********** 4 0.364 Lysosomal membrane glycoprotein 1 1247820 ********** 4 0.367 Tight junction protein 1 1247598 ********** 4 0.374 Retinoblastoma 1 1247595 ********** 4 0.378 PROBABLE CALCIUM-BINDING PROTEIN PMP41 1381928 ********** 4 0.379 Mus musculus MRJ (Mrj) mRNA, complete cds 1248196 ********** 4 0.399 Max protein 1381691 ********** 4 0.423 SRY-box containing gene 17 1248225 ********** 4 0.434 Mus musculus heat shock transcription factor 1 (Hsf1) gene, partial cds 1248084 ********** 4 0.442 Mus musculus Supl15h gene 1247941 ********* 4 0.453 Fibroblast growth factor inducible 14 1381623 ********* 4 0.468 Stearoyl-coenzyme A desaturase 1 1248202 ********* 4 0.473 Mouse mRNA for PAP-1, complete cds 1382115 ********* 4 0.512 GLUTATHIONE S-TRANSFERASE GT8.7 1382044 ********* 4 0.515 Cartilage derived retinoic acid sensitive protein 1381636 ******** 4 0.567 Lymphotoxin B 1381920 ******** 4 0.569 Mus musculus mRNA for NEFA protein, complete cds 1247757 ******** 4 0.596 Granzyme B 1382094 ******** 4 0.609 High mobility group protein 1 1247545 ******** 4 0.638 Carbon catabolite repression 4 homolog (S. cerevisiae) 1247607 *** 4 1.188 POLYADENYLATE-BINDING PROTEIN 1247727 4 1.667 Malate dehydrogenase, mitochondrial 1248244 ************** 5 Cluster [19 genes] in cluster [distNext: 3.473] wiCdist:mn+-sd=4.273+-2.059 CV=0.482 CD80 antigen 1248534 ********** 5 1.648 Carbonyl reductase 1247764 ********** 5 1.776 H-2 CLASS II HISTOCOMPATIBILITY ANTIGEN, GAMMA CHAIN 1381933 ********* 5 2.345 Mouse rpS17 mRNA for ribosomal protein S17, complete cds 1381616 ********* 5 2.42 Mus musculus oral tumor suppressor homolog (Doc-1) mRNA, partial cds 1248232 ********* 5 2.486 Mus musculus putative glycogen storage disease type 1b protein mRNA, complete cds 1382644 ******** 5 2.717 Cyclin G 1248125 ******** 5 2.791 Histocompatibility 2, class II, locus Mb2 1247799 ******** 5 2.869 Mus musculus signal recognition particle receptor beta subunit mRNA, complete cds 1247708 ******** 5 3.024 Ephrin A1 1247932 ****** 5 4.235 Mus musculus (clone: pMAT1) mRNA, complete cds 1382515 ***** 5 4.668 ATPase, Na+/K+ beta 3 polypeptide 1248586 ***** 5 4.838 Mus musculus viral envelope like protein (G7e) gene, complete cds 1248198 *** 5 5.874 Mus musculus D9 splice variant 2 mRNA, complete cds 1381623 ** 5 6.224 Stearoyl-coenzyme A desaturase 1 1382086 * 5 6.885 Mus musculus (strain C57Bl/6) mRNA sequence 1247887 * 5 7.014 Mouse chromosome 6 BAC-284H12 (Research Genetics mouse BAC library) complete sequence 1247886 5 7.810 Cut (Drosophila)-like 1 1248303 5 8.094 Lipopolysaccharide response 1247621 ************** 6 Cluster [17 genes] in cluster [distNext: 19.157] wiCdist:mn+-sd=12.410+-3.024 CV=0.244 Mus musculus Lsc (lsc) oncogene mRNA, complete cds 1248050 ******* 6 7.407 Mus musculus C57BL/6J ribosomal protein S28 mRNA, complete cds 1247698 ******* 6 7.571 Adipocyte protein aP2 1248240 ***** 6 9.198 Mus musculus mRNA, complete cds 1247862 **** 6 9.844 Mus musculus Nmi mRNA, complete cds 1382162 **** 6 10.330 CAMP responsive element modulator 1248398 *** 6 11.007 Mouse mRNA for ribosomal protein S12 1248281 *** 6 11.143 M.musculus mRNA for histone H3.3A 1247852 *** 6 11.576 Twist gene homolog, (Drosophila) 1381991 ** 6 12.809 Prolyl 4-hydroxylase, beta polypeptide 1382753 ** 6 13.019 Mus musculus cleavage and polyadenylation specificity factor (MCPSF) mRNA, complete cds 1248368 * 6 13.639 Mus musculus ribosomal protein S26 (RPS26) mRNA, complete cds 1247639 * 6 13.692 SRY-box containing gene 4 1248435 6 14.262 Thymus cell antigen 1, theta 1247961 6 14.75 ATP SYNTHASE ALPHA CHAIN, MITOCHONDRIAL PRECURSOR 1248344 6 15.217 Gut enriched Kruppel-like factor 1382234 6 16.351 CD8 antigen, beta chain
We call the genes closest to the "center" of the K clusters primary genes and they are reported with additional information. The "Cluster [# genes]" entries in the distance-to-cluster fields indicates that these genes are the center of the clusters (i.e. primary genes). The distNext is the distance from this cluster center to the next nearest K-means cluster center. The number of clusters N (6 in this example) is set in the popup state scroller. If you change the value of N, it will recompute the clusters and the primary genes.
It draws magenta circles around the primary genes in the microarray and the cluster number to the right of the circle. The size of a circle corresponds to the number of genes clustered with that circle. If you click on a gene belonging to any cluster, it defines that cluster as the "current cluster". It will change the labels of the subset of genes that belong to the current gene from red (white) circle to a green (yellow) cluster number of the current cluster in the intensity (ratio) pseudoarray image. In addition, the 'edited gene list' is set to the subset of genes that belong to the current cluster. If you are also displaying a scatter plot, genes in the current cluster have their red '+' characters changed to the cluster number.
Clustering is represented by a binary tree and is visualized as an ordered gene clustergram and optional dendrogram sub-plot. This is similar to the methods of (DeRisi, 1996), (Eisen, 1998), and (White, 1999). Currently, MAExplorer does 1-way clustering - not the 2-way clustering of (Weinstein, 1998) and (Eisen, 1998). Each row of the clustergram represents a gene and each column represents a HP in the HP-E list of samples. Each box in a row represents the normalized expression of that gene for the HP represented in that column. The color of the box is one of 9 colors representing the normalized expression ranges and assigned according to the following table:
Table 2.4.5.4. ClusterGram pseudocolor assignments. The colors are assigned to "box" entries in the clustergram corresponding to genes. The color represents data as either the X/Y ratio or X-Y Zdiff relative to the normalizing HP.
. | . | . | . | . | . | . | . | . |
bright green | . | . | dark green | Black | dark red | . | . | bright red |
<1/8X | 1/6X | 1/4X | 1/2X | 1X | 2X | 4X | 6X | >8X |
Figure 2.4.5.4 Hierarchical clustering clustergram of genes filtered by ratio histogram bins for 19 samples from the MGAP data set. The hybridized samples are drawn as colored boxes in the 19 columns. Rows of boxes correspond to gene expression profiles. In A), the set of all genes and ESTs was filtered by the CV filter set to 0.387 and the normalization was the Zscore. The gene "Mus musculus D9 spice variant 2 mRNA, complete cds" was selected as the current gene in the clustergram. Data for this gene and the selected HP column is indicated at the top of the clustergram. The list of the 19 samples is shown on the left. B) Details of clustergram and dendrogram are shown where the user had selected a cluster distance threshold at "Mouse mRNA for mitochondrial cytochrome c oxidase subunit Vb" in the dendrogram part of the plot (zoomed by 2X). This selection draws all parts of the dendrogram tree that are less than this distance are drawn in red. C) shows the manual selection of genes from the ClusterGram or Dendrogram by clicking on the genes names you wish to capture in the Edited Gene List (EGL) while the Control key is pressed. The zoomed subregion shows three genes in the same cluster that were selected (magenta stars in the right edge of the ClusterGram).
Figure 2.4.6 Reports menu. You may create either dynamic or tab-delimited text reports of either Samples or of subsets of genes.
These may be presented as interactive dynamic tables as well as scrollable text windows capable of being exported to Excel. If Web DB access is enabled, clicking on an entry will bring up a Web browser with access to GenBank data. If the report contains Clone ID as one of the fields, you can click on it to have it define that gene as the current gene and highlight it in the microarray image or scatter plot (if it is being used). The reports are divided into two types - those dealing with lists of arrays (i.e. the sample experimental condition) and those dealing with lists of genes.
The Report menu includes:
The "Samples vs Samples correlation coefficients" computes the correlation coefficients in an upper diagonal matrix for the current set of Filtered genes showing HP samples similarity. Then entries are of the following form where HP:1 and HP:2 correspond to samples listed in the field names of the table and the data is the intensity values using the current normalization method.
rSq=0.748, n=1656, HP:1(mn+-sd)=(28991+-19564), HP:2(mn+-sd)=(5044+-9766)
The "Calibration DNA summary" table contains the computed means, std-dev, and computed normalization scale factor for all active hybridized samples. The scale factors are used if the 'Calibration DNA' normalization is used.
You must set the Web access checkbox if you want to click on a blue hyperlink in the resulting report to access an associated Web database.
Figure 2.4.6.1 Hybridized samples dynamic Report windows. A) Samples Info report. B) Sample Web links. Clicking on a blue hypertext link brings up the corresponding genomic Web database entry in a separate Web browser window if the Web access is enabled. The tab-delimited version of the same reports (not shown) may be cut and then pasted into other programs such as an Excel spreadsheet. C) HP vs HP correlation table on genes passing the data Filter for all samples in the HP-=E list.
If Cy3/Cy5 ratio data is being analyzed, then the Highest (Lowest) F1/F2 entries become
Figure 2.4.6.2 Gene Report windows of 50 named genes with highest HP-X/HP-Y 'set' ratios. A) Dynamic gene report of 50 genes with highest HP-X/HP-Y 'set' ratios. A similar report may be generated for the lowest ratios or for single HP-X/HP-Y samples. This type of report may be generated for the highest or lowest Zdiff values when the Zscore normalizations are used. Clicking on a blue hypertext link brings up the corresponding genomic Web database entry in a separate Web browser window if the Web access is enabled. It also sets the current gene to the gene for that row. B) The tab-delimited version of the same report may be cut and then pasted into other programs such as an Excel spreadsheet.
Figure 2.5 View Menu options. These are divided into various options for modifying the presentation as well as recording activity such as the messages or history popup scrollable log windows.
Figure 2.5. Popup genomic browser database page. A) The UniGene Web page pops up in a new Web browser window when the user clicks on a gene in the array image, 2D scatter plot or Report and the view is set to "Display current gene in Unigene Web Browser" toggle was enabled in the View menu. The current gene was "Jun-B oncogene". Alternatively, the B) mAdb Gene DB may be selected - as well as GenBank or dbEST genomic databases. C) Alternatively, data from the NCBI LocusLink database may be accessed if either the GenBank ID or LocusID is available.
Figure 2.5.2 Examples of messages and command history popup log windows. Measurements and other activity are shown in more detail in the messages window whereas the command history indicates commands (numbered in the order they are executed) in the command history window. Data from either of these windows may be saved in text log files.
Figure 2.6 MAEPlugins paradigm. If you have a MAEPlugin .jar file, then it may be specified using the "Load plugin" command. When you invoke the command from the menus (or other methods), it accesses data from the current MAExplorer database it may need from the Open Java API.
The Plugins menu includes:
The Save RLO reports in time-stamped Report/ folder [CB] options puts files generated by R from successive executions of the same RLO into separate sub-folders in the Report/ folder with names "RLOname-YYMMDD-HHMMSS/" to peep the data separate. This is useful when you want to compare results from the same RLO method but with different MAExplorer preprocessing.
You may download the latest versions of all plugins using the (File | Update Plugins from maexplorer.sourceforge.net) menu command. Similarly, you can update your versions of the RLO methods using (File | Update RLO methods from maexplorer.sourceforge.net
Figure 2.6.1 Loading a MAEPlugin from your file system using the Load Plugins command in the Plugins pull down menu. If you have a plugin .jar or .class file, it may be specified using the "Load plugin" command. This pops up a file browser to let you specify the plugin file.
Figure 2.6.2 Executing the new command previously loaded in the Plugin menu. Selecting the new "Show List Active Filters" command that now appears in the Plugins menu invokes the plugin. This pops up a report shown in the next figure.
Figure 2.6.3 Popup window from executing the MAEPlugin. This plugin gives a full report on the data Filter status in a new pop up window.
Figure 2.6.2 Plugins menu - executing a previously loaded plugin. Plugins that do not go into particular MAExplorer submenus go into the Plugins menu. Selecting the command will invoke that MAEPlugin.
The Help menu includes:
Database-specific help menu entries - entries defined for a particular database (see below) |
This section briefly addresses some of the issues you need to consider. However, a full discussion of the issues involved is beyond the scope of this manual. These issues are covered in other more focused statistical methods literature and you might also address them in consultation with biostatisticians. The Internet has vast resources for microarrays. A few to get you started might include: a microarray citation electronic library http://arrayit.com/e-library/, the National Library of Medicine PubMed journal search engine, a general microarray Listserv GENE-ARRAYS@ITSSRV1.UCSF.EDU. The MGED group (Brazma, 2001) has published the MIAME standard which specifies (Minimum Information About a Microarray Experiment). This information is useful in doing an analysis. Also try searching using general Internet search engines. There are a number of public microarray data repositories. One that we find useful is NCBI's GEO (Gene Expression Omnibus), that contains array data and MIAME compliant information about the arrays.
A good and appropriate experimental design (i.e. the design and setting up of experiments to subsequently be analyzed) is critical for resolving significant differences in gene expression between experimental conditions. We touch on some of the issues here. (Simon, 2001), (Dudoit,2000), and Kerr and Churchill (2001a, 2001b) discuss some of the issues of experimental design for microarrays. We do not currently implement the Kerr-Churchill method. However, some of the issues involved in experimental design based on the types of arrays are discussed in Section 3.1.1 for (Cy3/Cy5)-labeled as well as 33P-labeled samples.
If users are comparing two different types of samples, the analysis would be different than if they were comparing an ordered sequence of samples (e.g. time series, cell cycle, dose-response, tumor-stage, etc.). MAExplorer gives users the ability to:
Briefly, data mining is the discovery of potentially interesting patterns in the data that were previously unknown. One approaches the analysis of a set of data with minimal expectations. However, some idea of what you are interested in helps focus the search. But beware of the trap of mining the data until you get the results you hope for. The following figure helps illustrate this process.
Proper experimental design of microarray experiments is critical to successful use of microarray data. Several recent reports discuss some of the key issues involved in various aspects of statistical analysis of microarrays: (Radmacher, 2001), (McShane, 2001), (Korn, 2001), (Simon, 2001), (Dudoit,2000).
a) (Cy3X/Cy5X1) / (Cy3Y/Cy5Y1) becomes b) (Cy3X/Cy3Y)However, this new comparison is accompanied by additional noise because of use of the two Cy5P intermediaries.
An alternative method would be to compute (Cy3X/Cy5Y) directly. However, this too has its own sources of error and other problems, namely that not all genes are labeled symmetrically with the two dyes since different dyes may have different sequence specific affinities due to a variety of causes. For that reason, dye-swap experiments are often done. I.e. the two samples would be run as (Cy3X/Cy5Y) as well as (Cy3Y/Cy5X). If one were to plot (Cy3X/Cy5Y) against 1.0/(Cy3Y/Cy5X) and the data were perfectly symmetric (which they are not) then one would expect a straight line. That is generally not what you get in practice.
Another issue is that when you have a number of samples A, B, C, D, ..., N and wish to compare them, there are a number of alternate experimental designs you can use with different resulting sets of advantages and problems. If a common pooled Cy5P sample P were used, then the following experiments would be done:
(Cy3A/Cy5P), (Cy3B/Cy5P), ... , (Cy3N/Cy5P)This assumes that there is enough of the pooled sample P to be used for all of the experiments - otherwise additional sources of error would be introduced. MAExplorer is ideally used with this common reference sample P. It a common pooled sample is not used, then the experimental design becomes more complicated - especially if dye-swap experiments are performed for all samples. For N samples taken 2 at a time (i.e. Cy3 and Cy5), then the number of experiments may be impossibly large to perform for other than a very small N. Eg. for N of 3, the number of experiments is 3 and 6 if dye swap experiments are also performed. For N of 4, the number of experiments is 6 and 12. And this is without doing any replicate experiments. If a reasonable number of replicates is added, then this set of experiments becomes even difficult to perform.
MAExplorer is currently not oriented to handling these large combinatoric types of non-pooled sets of experiments. However, you do have the ability to swap (Cy3,Cy5) data on an individual basis so you could compute an average of data from dye-swap experiments - but with the caveats or non-uniform labeling mentioned above.
[(Cy3X/Cy5Y) + 1.0/(Cy3Y/Cy5X)]/2In general, this is probably not a very good estimate.
Direct user manipulation of data, as incorporated in MAExplorer, was defined by (Schneiderman, 1997) who defends the position that the direct manipulation of data in data mining is an extremely effective means to amplify human creativity in understanding patterns. Schneiderman's dogma states "overview first, zoom, and then filter details on demand" and favors the use of "shallow search trees, slide controllers, and information-right screens with tightly coordinated panel view of data", (Beardsly, 1999). MAExplorer also uses many of these direct manipulation principles. It was designed to run on the desktop computers with data residing on the same computer and loaded into its memory for rapid direct manipulation - for both the Web browser and stand-alone versions.
Part of the Flicker system allows comparison of user 2D gel images with standard images from SWISS-2DPROT for putative identification of unknown spots in the user gels. The user would select a standard 2D gel image from over 20 tissue types, enter their own 2D gel image and align them at spots of interest. They could then switch to a database access mode, click on those spots and generate popup SWISS-2DPROT Web pages for those proteins - similar to Clone reports in MAExplorer. That is accessed at http://www.lecb.ncifcrf.gov/flicker/swissProtIdFlkPair.html.
MAExplorer will have a groupware facility similar to what we have done with our WebGel (http://www.lecb.ncifcrf.gov/webgel/) system described in (Lemkin et al., 1999b). It is a two-dimensional electrophoresis system for sharing data analyses. In WebGel, users may perform a data-mining analysis and leave the state of the their analysis and accompanying notes to share with their collaborators on a login-protected basis.
We now discuss using these tools for analyzing ones data.
Table 3.2 Steps in a data-mining analysis.
|
In designing a data mining experiment, the first decision to be made is selecting the set of hybridized samples to be compared (steps 1 and 2). This is accomplished by setting the current hybridized sample-X (HP-X) and hybridized sample-Y (HP-Y). In Figure 2.4.4.2 for the scatter plot we selected a single C57B6 pregnancy day 13 and a single Stat5a (-,-) pregnancy day 13 as current HP-X and current HP-Y samples. Changing the normalization changes the view in the scatter plot so that hidden differences may be more apparent (see Figure 2.4.2.3)
The names of the current HP-X and HP-Y samples are displayed at the top of the main window. The current HP-X and HP-Y samples may be changed at any time by clicking on a new sample from a list of samples shown on the left side of the main window or from lists of samples organized by sample population in the Samples menu.
The next decision to be made is selection of the genes to be studied by choosing a subset from the gene class menu list (step 4). Further selection occurs throughout the analysis by clicking on spots in microarray images, points in graphic plots or cells in spreadsheets, by adjusting threshold sliders, or using the text-entry "guesser" to type in gene names, clone IDs, genomic IDs, samples, etc.
The next decision the user must make is to set the intensity data normalization mode (step 3). Normalization of quantitative data is crucial when comparing data between different hybridized microarrays because of spotting, hybridization efficiency, uniformity, and other systematic errors.
Genes of interest may be separated for all of the genes in the database using a cascade of data filters (step 4). Additional filtering options are easily accessible in the (data) Filter menu. Some of the filters require additional parameters. These parameters are set by state scroll bars that pop-up on the screen when data filters requiring them are added to the filter cascade. Changing scroller values causes the data filter to be automatically be reapplied and a new set of genes to be computed.
It is desirable to reduce false-positives found by the data filter by eliminating genes with high quantification variability between duplicate spots on the same sample or spot duplicated in replicate samples. If duplicate genes are available on the array (denoted by Field 1 and Field 2 or F1 and F2 spots), this allows the computation of a coefficient of variation (CV) for the duplicates. This CV may be used in a data filter to reduce potential false-positives. CV is computed as 2|F1-F2|/(F1+F2) using those spot values for each gene, as StdDevHP/MeanHP for a set of replicate hybridized samples.
Graphical views of the data give the user additional insights into the data. These include spot intensity and ratio or Zdiff pseudoarray images, scatter plots, histogram plots, expression profile plots, cluster plots showing genes similar to a specified gene, the number of clustered genes for each clone, divisive clusters for K-means clustering, and clustergrams and dendrogams for hierarchical clustering.
When there are too many EP-plots to be viewed simultaneously, you might use a scrollable list of expression profile plots that lets you scroll through an arbitrarily large list of genes. However, it is difficult to compare genes that are not sorted in some way (i.e. clustered). Therefore, these are most useful when used after clustering the data and displaying the scrollable EP-plots of the cluster-order data.
Clustering is one way of possibly finding co-expressed genes that exhibit similar expression changes in a set of samples. Genes may show similar co-expression, but that does not prove they are co-regulated at the same point in a pathway - merely that measurements of those genes in a particular set of experiments show similar expression. However, identifying genes with similar expression for which some information is already known about some of the genes may be useful as a starting point to help figure out gene function and pathway using additional experiments and analysis.
There are many methods for doing clustering - each with advantages and disadvantages. We present three methods in MAExplorer and plan on adding a variety of more powerful methods through the MAEPlugin facility under development.
The first cluster method finds a cluster of genes whose expression profiles are similar to that of the currently selected gene. This list of genes is restricted by the constraint that the cluster distance between each of these genes to the selected gene is less than the "Cluster threshold" distance set by the user with a scroll bar. It displays genes that are found both with blue boxes (the larger the box, the higher the similarity) and in a text report window showing the genes and their distances to the current gene. By varying the threshold and observing the results, the user can find a set of highly correlated genes. If the threshold is set to 0.0, no genes are found. If it is set too high, all data filtered genes are found. So it is critical to adjust the threshold to a reasonable level commensurate with the type of data being analyzed and the approximate number of genes expected.
A second cluster method draws blue circles in the array image around all filtered genes meeting the threshold criteria, where the larger the circle the larger the number of similar genes (i.e. passing the threshold) are found to be clustered with that gene. Clicking on a gene toggles between the first and second methods. For both of these methods, it will pop-up a "Cluster Distance" threshold scroller and recomputes the clusters if you change the scroller value or the current gene. It also shows a text report that displays the number of genes similar to each data filtered gene.
A third method called "K-means" clustering K genes (we call primary nodes) whose expression profiles are most orthogonal to each other. It uses the current gene as the first or "seed" node. It then finds the gene furthest from this and assigns it as node 2. Then the gene furthest from both nodes 1 and 2 is assigned to node 3, etc. This process is repeated until all K nodes are assigned. Then the remaining genes are assigned to the closest node. Having defined the initial cluster centers, it recomputes the centroid of each of the clusters. The centroid can alternatively be computed using a median instead of a mean in which case we would be doing K-median clustering (Bickel, 2001). K genes are then reassigned to the nearest new centroids as the new K-means node instances. Finally, the remaining genes are assigned to the nearest centroid. A scrollable K-means cluster text window report pops up with genes sorted by cluster. Clicking on a gene in either the array image or scatter plot assigns all genes in the cluster to which that gene belongs to the "current cluster". Genes in the current cluster are labeled in the array and scatter plot with a small number of the cluster. In addition, genes in the current cluster are copied to the E.G.L. where they can be used in a report, saved in a named gene set, or used for additional filtering. It also pops up a "N-clusters" scroll bar window to let you dynamically adjust the number of clusters. Changing N will recompute the clusters. When the K-means is recomputed, it uses the current gene as the initial seed gene.
The fourth method is a hierarchical clustering method that generates a clustergram and dendrogram similar to that of Eisen's red-black-green clustergram (Eisen, 1998). This was derived from the clustered correlation map (ClusCor) of Weinstein et al. (Weinstein, 1997). The MAExplorer clustergram and dendrogram are dynamic and may be interrogated and used to set the current gene. This means that it may also position a corresponding ordered list of expression profile plots to the same gene so you may view the data as a plot as well. The dendrogram may be zoomed in to explore a part of the dendrogram in more detail. As with the K-means clustering, a report can be made of the ordered genes.
Then, the expression profile is expressed as a list of values:
ej = (vj1, vj2, vj3, ..., vjN)A difference between two genes p and q may be estimated as a N-dimensional metric "distance" between ep and eq. The Euclidean distance is then defined as
dpq = (1/N SUMj=1:N (vjp - vjp)2 )1/2Other distance measures may include correlation coefficient, city-block (or manhatten distance) etc.
For scaled data such that dpq has a maximum value of 1.0 ovger all samples. A similarity measure could be computed as 1.0 - distance or
spq = 1 - dpq
djs < TThe threshold T is set by the investigator and in MAExplorer is changed using a slider. Typically, the set of all genes {gj} found is sorted by similarity before being viewed.
Algorithm:
|
D can get quite large for clustering a large number of genes N [for N=5000, this is > 50 Mbytes!]
The following is a simplified definition of one way to compute a hierarchical clustering of gene expression profile data.
Algorithm:
|
[<field>-<grid name><row#>,<col#>]. e.g. [1-A4,3]
If there is only one field in the array, it will appear as field 1. In the above example, [1-A4,3] is field 1 grid A row 4 and column 3. Note that the pseudoarray coordinates are for visualization purposes in MAExplorer and may or may not be the same as the coordinates on the actual array. That depends on how the MAExplorer database was defined in the configuration file described in Appendix C.
When the current gene is defined, it will draw a yellow (green) circle around the spot in the ratio (intensity) pseudoarray image and display other features of the gene in the three-line status area near the top of the main window. If background correction is enabled (the "Use background intensity correction" in the Normalization menu), then spot intensity values will appear as intensity' (with background intensity subtraction) and intensity (without background subtraction).
There are a number of different reporting formats available depending on the array display mode and particular normalization method selected. These include: the pseudoarray image of the intensity of a single sample, the pseudocolor ratio X/Y or Zdiff (X-Y) image (using either HP 'sets' or single samples), or the ratio of Cy3/Cy5 for dual-labeled dyes or F1/F2 for replicate spots for a single sample. In addition, the normalization mode is also displayed in the reporting line. We will present examples of each of these different reporting formats.
You may show the intensity data for a particular spot in the currently displayed pseudoarray image. First select the "Pseudograyscale image" option in the "Show Microarray" submenu in the "Plot menu". If your data has duplicate grids (i.e. fields F1 and F2) then you may look at F1, F2 and mean (F1+F2)/2 data in the reports when you click on a spot. If the "Gang F1-F2 scrolling" switch is disabled in the "View menu", then the intensity value is the intensity data value for the gene at that location. If the "Gang F1-F2 scrolling" switch is enabled, then it reports intensity[F1], intensity[F2], and the F1/F2 ratio. These two formats are shown in the following two examples for a C57B6 pregnancy day 13 samples in the MGAP database:
a) Field F1 spot for a single spot in a single sample with the median intensity selected.
[1-A4,5] intensity=4.5267, (Norm.: median intensity) CloneID: 1248228, dbEST3': 2279072, GenBankAcc3': AI463183, UniGene: Mm.13859, plate[5,A,5] GeneName: Mus musculus ribosomal protein L41 mRNA, complete cdsb) Field F1 and F2 replicate spots for a single sample. The top line is shown for each of the different normalization methods.
[1-A4,5] intensity[F1]=-0.3067, intensity[F2]=-0.2312, F1-F2=-0.0755, (Norm.: Zscore intensity) [1-A4,5] intensity[F1]=4.5267, intensity[F2]=6.2408, F1/F2=0.7253, (Norm.: median intensity) [1-A4,5] intensity[F1]=0.8755, intensity[F2]=1.1457, F1-F2=-0.2701, (Norm.: log median intensity) [1-A4,5] intensity[F1]=-0.1442, intensity[F2]=-0.0945, F1-F2=-0.0497, (Norm.: Z-score, stdDev, log intensity) [1-A4,5] intensity[F1]=-0.1533, intensity[F2]=-0.1004, F1-F2=-0.0528, (Norm.: Z-score, mean abs.deviation, log intensity) [1-A4,5] intensity[F1]=630.9911, intensity[F2]=869.9273, F1/F2=0.7253, (Norm.: calibration DNA intensity) [1-A4,5] intensity[F1]=1919.9376, intensity[F2]=2646.957, F1/F2=0.7253, (Norm.: scale to max. (65K) intensity) CloneID: 1248228, dbEST3': 2279072, GenBankAcc3': AI463183, UniGene: Mm.13859, plate[5,A,5] GeneName: Mus musculus ribosomal protein L41 mRNA, complete cdsIf the "Pseudocolor HP-X/HP-Y ratio or Zdiff" option is selected in the "Show Microarray" submenu, data is reported as either Ratio or Zdiff data depending on the normalization method selected. The data used in the following examples is for C57B6 pregnancy day 13 (HP-X) compared with Stat5a (-,-) pregnancy day 13 (HP-Y).
c) Ratio data for two samples X and Y in separate hybridized arrays. Ratio data for the field F1 and F2 spot data as well as the mnX/mnY ratio is reported. The median normalization was used in this example.
[1-A4,5] HP-XY: mn(X,Y)=(5.383,6.834) (X/Y)(F1,F2,mean)=(0.651,0.928,0.787), (Norm.: median intensity) CloneID: 1248228, dbEST3': 2279072, GenBankAcc3': AI463183, UniGene: Mm.13859, plate[5,A,5] GeneName: Mus musculus ribosomal protein L41 mRNA, complete cdsd) Zdiff data for two separate samples X and Y. Ratio data for the field F1 and F2 spot data as well as the mnX-mnY Zscore difference is reported. The three Zscore, ZscoreLog, and logMean normalizations were used in this example (first lines are shown).
[1-A4,5] HP-XY: mn(X,Y)=(-0.269,0.151) (X-Y)(F1,F2,mean)=(-0.470,-0.370,-0.420), (Norm.: Zscore intensity) [1-A4,5] HP-XY: mn(X,Y)=(-0.119,0.051) (X-Y)(F1,F2,mean)=(-0.199,-0.142,-0.170), (Norm.: Z-score, stdDev, log intensity) [1-A4,5] HP-XY: mn(X,Y)=(1.010,1.224) (X-Y)(F1,F2,mean)=(-0.362,-0.064,-0.213), (Norm.: log median intensity) CloneID: 1248228, dbEST3': 2279072, GenBankAcc3': AI463183, UniGene: Mm.13859, plate[5,A,5] GeneName: Mus musculus ribosomal protein L41 mRNA, complete cdse) Example of when the "Use dual HP-X & HP-Y Pseudoimage" mode is enabled in the "Show Microarray" submenu of the "Plot" menu. This displays mean data for the HP-X and HP-Y data side-by-side. The median normalization was selected.
[1-A4,5] intensity[X]=5.3837, intensity[Y]=6.8342, X/Y=0.7877, (Norm.: median intensity) CloneID: 1248228, dbEST3': 2279072, GenBankAcc3': AI463183, UniGene: Mm.13859, plate[5,A,5] GeneName: Mus musculus ribosomal protein L41 mRNA, complete cds
f) Multiple HP-XY 'sets' using median normalization for the pseudoarray image display for the HP-X 'set' of three C57B6 samples.
[1-A4,5] HP-X 'set' mean intensity=3.295 stdDev=1.482 CV=0.449 n=3, (Norm.: median intensity) CloneID: 1248228, dbEST3': 2279072, GenBankAcc3': AI463183, UniGene: Mm.13859, plate[5,A,5] GeneName: Mus musculus ribosomal protein L41 mRNA, complete cdsg) Multiple HP-XY 'sets' using median normalization for the pseudoarray image display for the HP-Y 'set' of five Stat5a (-,-) samples.
[1-A4,5] HP-Y 'set' mean intensity=8.180 stdDev=0.986 CV=0.120 n=5, (Norm.: median intensity) CloneID: 1248228, dbEST3': 2279072, GenBankAcc3': AI463183, UniGene: Mm.13859, plate[5,A,5] GeneName: Mus musculus ribosomal protein L41 mRNA, complete cdsh) Multiple HP-XY 'sets' using median normalization for the pseudoarray image display for the HP-X and HP-Y 'sets' when the "Use dual HP-X & HP-Y Pseudoimage" mode is enabled in the "Show Microarray" submenu of the "Plot" menu.
[1-A4,5] HP-XY 'sets': mn(X,Y)=(3.295,8.180) mnX/mnY=0.402 SD(X,Y)=(1.482,0.986) CV(X,Y)=(0.449,0.120)\ n(X,Y)=(3,5), (Norm.: median intensity) CloneID: 1248228, dbEST3': 2279072, GenBankAcc3': AI463183, UniGene: Mm.13859, plate[5,A,5] GeneName: Mus musculus ribosomal protein L41 mRNA, complete cdsi) Multiple HP-XY 'sets' using median normalization for ratio (HP-X/HP-Y) data for the "Pseudocolor HP-X/HP-Y Ratio or Zdiff" display.
[1-A4,5] HP-XY 'sets': mn(X,Y)=(3.295,8.180) mnX/mnY=0.402 SD(X,Y)=(1.482,0.986) CV(X,Y)=(0.449,0.120) \ n(X,Y)=(3,5), (Norm.: median intensity) CloneID: 1248228, dbEST3': 2279072, GenBankAcc3': AI463183, UniGene: Mm.13859, platey[5,A,5] GeneName: Mus musculus ribosomal protein L41 mRNA, complete cds
j) Multiple HP-XY 'sets' p-value using median normalization for ratio (HP-X/HP-Y) data for the "Pseudocolor (HP-X,HP-Y) 'sets' p-value display.
[1-A7,20] HP-XY: mn(X,Y)=(3.449,0.853) (X/Y)(F1,F2,mean)=(4.09,4.008,4.041), (Norm.: median intensity) CloneID: 1382656, dbEST5': 1775754, GenBank 5': AI036495, UniGene: Mm.300, plate[12,A,8] GeneName: Carbonic anhydrase 3
[1-A6,11] Cy5/Cy3=0.3588, Cy5=67.324, Cy3=187.622, (Norm.: median intensity) CloneID: IMAGE:1054189, GeneName: expressed sequence AW213287
[1-A5,16] intensX=4.695, intensY=5.923, (X-Y)=-1.2275, (Norm.: log median intensity) CloneID: IMAGE:963758, GeneName: RIKEN cDNA 2410114O14 gene
For the intensity and ratio threshold filters, the range interpretation may be inside, or outside the specified range. The ratio range [R1:R2] is between 0.01 and 100.0. The Zdiff range [Z1:Z2] and [CZ1:CZ2] are between -4.0 and +4.0. The intensity threshold range [I1:I2] is set to the dynamic range of the min and max intensity for the current normalization method.
A list of possible threshold sliders is shown in the following table. When a Filter is enabled that requires a slider, it pops up the State Scrollers window that contains one or more slides. When you disable all filters that use these sliders, the popup window will disappear. The corresponding Ratio R1[R2] or Zdiff Z1[Z2] sliders are used if you are using a ratio or Zscore normalization - and will change if the normalization changes while the filter is active.
Some of the sliders are implemented with a non-linear scale so that you have more resolution at the low end (eg. p-Value, Spot CV, Diff HP-XY).
Depending on the set of data Filters selected, there may be multiple sliders present in the State Slider popup window (eg. see Figure 2.4.3).
Table 3.3.1. List of threshold sliders. Sliders are enabled in the State-Scroller popup window when the corresponding data filters are enabled.
Slider name | Associated with operation |
---|---|
Spot Intensity SI1 | Filter by spot intensity range per channel |
Spot Intensity SI2 | Filter by spot intensity range per channel |
Percent SI OK | Filter by percent of spots whose spot intensity is in threshold range criteria meets the AT LEAST or AT MOST criteria |
Intensity I1 | Filter by gene intensity range |
Intensity I2 | Filter by gene intensity range |
Ratio R1 | Filter by gene X/Y ratio range |
Ratio R2 | Filter by gene X/Y ratio range |
Zdiff Z1 | Filter by gene X-Y Zdiff range |
Zdiff Z2 | Filter by gene X-Y Zdiff range |
Ratio CR1 | Filter by Cy3/Cy5 gene X/Y ratio range |
Ratio CR2 | Filter by Cy3/Cy5 gene X/Y ratio range |
Zdiff CZ1 | Filter by gene (Cy3-Cy5) X-Y Zdiff range |
Zdiff CZ2 | Filter by gene (Cy3-Cy5) X-Y Zdiff range |
p-Value | Filter by t-Test |
Spot CV | Filter by Coefficient of Variation |
Cluster Distance | Plot - cluster by expression similarity |
# of Clusters | Plot - K-means clustering |
Diff HP-XY | Filter by absolute difference (HP-X,HP-Y) |
Spot Quality | Filter by continuous spot quality (If data available) |
If you are running on a windowing system supporting cut and paste, then you may cut and paste data from reports and plots into applications on your system that allow you to save or print this data. Set the Report menu table-format to "Tab-delimited". Then, in Windows 95/98/NT/2000/XP, cut data from the popup tables (or other text reports) and paste it into Microsoft Excel. In Windows, you can capture (i.e. "cut") the entire screen by pressing the "Prt Sc" or print screen button. To capture a specific window (e.g. a scatter plot), hold the "Alt" key when pressing the "Prt Sc" key. Then go into a Windows imaging application (such as PhotoShop) and paste it into the application. In PhotoShop, in the File menu, select New (or type Control/N). Then when the window is opened, click on the window and paste the MAExplorer screen you had cut into the image window by typing Control/V. In both Excel and PhotoShop you may print the data or save it in a file.
Section 4.1 discusses known bugs, Section 4.2 lists the revision notes for older versions known bugs. If you have experienced bugs with an older version of MAExplorer, you might check the revision notes to see if the bug was fixed and download a new version. Section 4.3 discusses problems in using MAExplorer as an applet with Web browsers. Section 4.4 describes handling fatal "DRYROT" errors.
If you encounter a fatal error that is detected by MAExplorer, it will popup an error reporting window. Please E-mail this data to us so we can try to resolve the problem.
In the mean time, partially implemented commands are disabled to keep you out of trouble :-) ...
You can help us and get MAExplorer to do more of the things you would like to see. Let us know of problems that you encounter as well as suggestions for changes or new methods you would like to see - send us E-mail.
If you are experiencing Web browser problems using the MAExplorer applet, you might check the discussion of possible solutions.
"Recommended version for your computer Download installer for ...your OS..."?Occasionally, we have seen instances where you can not install MAExplorer from within the Web browser. The solution is to explicitly download the particular Platform for your OS in the Available Installers list. And then to follow the instructions on running it.
limit stacksize unlimited
Note: An archive of some of the stable older releases is available on the NCI/LECB Web site for a limited period. |
Version 0.94.01: Major version release. |
Version 0.93.01: Major version release. |
Version 0.92.22: Last Stable Release. |
Version 0.91.01: Major version release. |
Version 0.90.01: Major version release. |
Version 0.89.01: Major version release. |
There is not enough memory to cluster current filtered clones. Options: 1. reduce the number of filtered clones and try again, or, 2. disable cluster-cache (Clustering menu) - will be VERY slow.
Figure 4.4 Example of a fatal Dryrot Error window. This may occur for a variety of reasons. This window lists the main reason and also lists some of the MAExplorer state information. If you wish, you may save this window (press the "SaveAs" button) and mail it to us. We may try to correct the problem in the next release if it is a problem with MAExplorer. Alternatively, it could be a user data error.
Figure 4.4.1 Example of a fatal Dryrot Error window after SaveAs. This tells you where the saved error message file was saved and the email address to send it to if you wish.
Release | Release Date | Manual (.zip) for Release |
---|---|---|
0.96.02 | 07-02-2002 | - |
0.95.20 | 05-31-2002 | MaeRefMan.zip (10Mb) |
0.95.16 | 05-24-2002 | MaeRefMan.zip (10Mb) |
0.95.04 | 03-22-2002 | MaeRefMan.zip (10Mb) |
Primary contributers to Cvt2Mae were Peter Lemkin (LECB/NCI), Greg Thornwall (SAIC/FCRDC), Bob Stephens (ABCC/NIH).
We wish to thank the many members of Lothar Hennighausen's Laboratory of Genetics and Physiology (NIDDK) who inspired the initial development of MAExplorer and its continued development. Thanks also to:
Greg Alvord (SAIC/FCRDC),
Kevin Becker and Chris Cheadle (NIA/NIH),
Breast Cancer Think Tank (NCI),
Damien Chaussabel (NIAID),
Terry Clark and Josef Jurrek (U. Chicago),
Mitko Dimitrov (LECB/NCI),
Jai Evans and Chris Santos (DECA/CIT/NIH),
Troy Moore (Research Genetics),
Peter Munson (CIT/NIH),
Alan Li (SourceForge),
Quang Tri Nguyen (LECB+LCRC/NCI),
John Powell and Esther Asaki (CIT/NIH),
Eric Shen (U. Arizona),
Moshe Shani (Agr. U. Israel),
Richard Simon (NCI/NIH),
Bob Stephens and Gary Smithers (ABCC/FCRDC),
Ron Taylor (U. Colorado),
Mark Vawter (NIDA/NIH, UC-Irvine),
John Weinstein (LMP/NCI), David Kane (SRA/NCI), Ajay (LMP/NCI),
and to many others for useful discussions and suggestions that have
helped improve the MAExplorer's capabilities and usability.
Thanks also to Jeff Thomas, Charmaine Richman, and Tom Stackhouse (NCI) for helping with the MAExplorer Open Source process.
This tutorial lets you
NOTE: THIS APPENDIX IS BEING REVISED AND EXPANDED... |
---|
There is also a pre-computed example of an Ordered Condition List using 4 conditions of replicates of C57B6 (pregnancy day 13, lactation days 1 and 10, and stat5a(-,-) 15 samples. The database also includes 4 additional condition sets of this data and an Ordered Condition List of the 4 conditions (in the State/ directory). This may be used to demo the OCL F-test filter.
If you have access to another MAExplorer database, you can use it instead since the tutorials are fairly generic.
Using the stand-alone application for the tutorial
These same subsets as well as other subsets of the MGAP data are available in the set of .mae startup files distributed with MAExplorer. To access these files,
First, select one of the start up databases.
When it starts, a main window will pop up. It then downloads a gene database tables and the particular hybridized samples you specified. When it is ready for you to begin interaction, the menu bar will become active and it will display a green Ready - click on a gene to query database message. Depending on your Internet connection speed, it may take a few minutes to set up. If you are running MAExplorer as a stand-alone application and it is getting data from your local disk, startup will be much faster.
Second, go to the A.3 instructions for
self-guided tutorial below for instructions on what to do next.
HINT: print this tutorial page and then read the following instructions from the printout rather than trying to keep this window visible. You might also print the parts of the MAExplorer Reference Manual for the same reason.
HINT: You might want to keep a record of the commands you have used or the messages and measurements you have made. To do this you need to enable message and command history logging. Go to the View pull-down menu and then select the type of logging you want using the Show log of messages or the Show log of command history commands.
NOTES:. On computers with low resolution (i.e. less than 1024 X 780) you may need to resize the windows and move them to different parts of the screen to view them simultaneously.
step 1: go to Analysis: Plot: Scatter plots: HP-X vs. HP-Y.
then click on yellow circle in scatter plot to get HP-X/HP-Y
ratio for the gene
step 2: click on any point in the scatter plot
this also alternatively defines any gene in the plot as the new
current gene
step 3: zoom in on a region of the plot using the vertical or
horizontal scroll bars
step 4: click on another point in the scatter plot to get the
HP-X/HP-Y ratio another gene
step 5: press "Close" button to remove pop up window
step 1: go to Analysis: Plot: Scatter plots: Cy3 vs. Cy5
or go to Analysis: Plot: Scatter plots: F1 vs. F2
Then, click on green circle in scatter plot to get Cy3/CY5
ratio for the gene
or F1/F2 ratio for replicate spots for that gene
step 2: click on any point in the scatter plot
this also alternatively defines any gene in the plot as the new
current gene
step 3: zoom in on a region of the plot using the vertical or
horizontal scroll bars
step 4: click on another point in the scatter plot to get the
HP-X/HP-Y ratio another gene
If you are working with Cy3/Cy5 dye-swap data, you may swap the Cy3/Cy5 channel data to Cy5/Cy3 for any selected subset of samples. This may make it easier to use the data in various ways when data mining. If you do not have this type of data, go to step 7.
step 5': go to Samples: Edit (Cy5/Cy3) else use (Cy3/Cy5) menu
step 6': select the samples you wish to swap and press "Done". This
enables you to see the swapped results in the scatter plot
step 7: press "Close" button to remove pop up window
Note of caution: if the signal is close to background the X/Y ratio
may be bogus.
You can filter out low intensity genes by
NOTE: THIS APPENDIX IS BEING REVISED... |
---|
In the Filter menu, add the "Filter by ratio or Zdiff sliders". Then the [R1:R2] ratio range sliders are added to the state slider window and may be used for filtering genes. If the normalization method is one of the Zscore methods, it filters by the difference of the Zscores otherwise by the ratio and the [Z1:Z2] range is used. Note that the genes that pass the filter will appear to have a red (white) circle in the pseudoarray intensity (ratio) grayscale (pseudocolor), or red "+" in the scatter plots so you might try moving the controls while in those plot modes. Try some of the other filters. The spot CV test removes genes where replicate spot values (F1 and F2 in the case of a single sample or replicate samples in the case of HP-X and HY-Y 'sets' or the HP-E' list of genes) are not well correlated. The t-Test filter may be used with sets of X and Y samples to find genes with a p-value less than the specified threshold.
Turn on one or more Filters to reduce the number of genes to say under 100 (e.g. t-test or spot CV filters). Then press the "Go 'Cluster all genes'" button in the cluster window. This is equivalent to invoking the "Cluster counts of Filtered genes by expression profiles" command from the "Cluster plots" submenu. Notice the Filtered genes has blue circles of different sizes. The larger the circle, the more genes there are that are similar to that gene. Move the cluster threshold slider and note that the number of similar genes changes, the size of the blue circles will change. As with the other cluster mode, you may generate a report of sorted cluster counts. Click on a gene with the largest green circle. This will then switch you back to single gene clustering mode where you can investigate that gene in more detail.
Note: This appendix contains a "computerese" description on how
to use MAExplorer with your array data. The user-friendly "wizard"
tool ![]() |
MAExplorer requires a specification of array geometry and quantification information. These are defined in a configuration startup file. The startup file contains the initial list of hybridized samples to be loaded, and other parameters such as the name of the configuration file (if it is different from the default name). A stand-alone application causes the .mae startup file (or the PARAM list in the case of an applet) to be read when it is started. The configuration file contains various defaults. If any of these are specified in the configuration file, the override the built in default values. Values from the .mae startup or applet PARAMs will override the configuration file values. These configuration parameters may be overwritten by arguments in the stand-alone .mae startup files or PARAMs in the Applet startup specifications.
A few additional files are required and are defined in the configuration file. These include: a Gene-In-Plate-Order or GIPO file; a samples database file listing names of the samples available for loading; and a gene class names file. An optional (but deprecated) extra array information file may be specified to access additional data about samples. Quantified hybridized sample array spot data (Quant files) from each array is put into a separate data file. Note that all data files are tab-delimited files such as may be generated with Excel, relational databases or directly from array spot quantification software.
Hybridized sample arrays must be scanned and then spots quantified using other software. MAExplorer does not do spot quantification from scanned image files. However, MAExplorer can use spot data from a variety of array image quantification programs that generate tab-delimited data files. The data needs to be converted to the MAExplorer schema described in this Appendix.
The derivation of quantified spot data files from hybridized sample arrays is discussed later in this section as are in the quant file data format.
The configuration file is created once for each new array GIPO geometry and database of hybridized samples. It is independent of the number of samples. Configuration parameters include array geometry (# of grids, # of duplicate spots/gene, etc), whether the data is intensity or ratio data (e.g. Cy3/Cy5), etc. The configuration file may also include labeling, quantification dynamic range, default analysis thresholds, mapping of used data file table-field names to expected MAExplorer names for the GIPO and quantification files, additional database-specific pull-down menu plugins, names of gene sets and sample condition lists, etc.
The GIPO file is independent of the number of array samples and describes the mapping between spot position in an array and its gene identification as well as corresponding data such as original plate number, row and column; UniGene ID, GenBank ID, dbEST ID, etc. These files will be described in more detail including how one can create the necessary database files that MAExplorer requires for use with various types of microarray data.
(specific database directories and files they contain) / Cache / (copies of any data files saved from Web DB access) / Config / MaExplorerConfig.txt / SamplesDB.txt / GIPO-db.txt / MAE / (set of startup database files).mae / Images / (set of original or sampled array .jpg images) (optional) / Plugins / (optional set of .jar or .class MAEPlugin files) / Quant / (set of spot quantified data files).quant / Report / (set of .txt and .gif report files generated using SaveAs / State / (set of gene set files).cbs and / (set of condition list files).hbl generated using Save DB |
Figure C.1 Directory structure of stand-alone databases required by MAExplorer. The "/Config", "/Quant", and "/MAE" directories are required. The /MAE directory is only used with the stand-alone version with .mae files, not for the applet. [When used with an applet, the main path is the path of the download JAR file and .mae files are not used.] The "/Report", and "/State" directories are created by MAExplorer as needed and the user need not create them prior to running MAExplorer. The text reports and plot GIF images are saved in the /Report folder when you "Save" a report or plot. When you "Save" the current database session (File | Databases | Save ...), the gene sets and sample lists are saved in the /State folder for use when you restart MAExplorer on the .mae startup file. The optional "/Cache" directory is only used (and then, only optionally) when downloading data from a Web server. The optional "/Image" directory is only used in there are JPEG images of the arrays provided and their resolution and alignment must correspond to the (X,Y) spot data in the Quant files. The "/Plugins" directory is where the MAEPlugins packaged with MAExplorer are normally kept and where MAExplorer looks when you attempt to load a plugin. Since you can browse your file system, they do not have to appear here.
Sample MGAP database configuration, quantification data and startup files are available for use as examples with which to make your own files or for inspection.
Similarly, when the entire database is saved (File | Databases | SaveAs ...DB) into a .mae startup file, the set of gene set files are saved as ".cbs" files and the set of condition list files are saved as ".hbl" files in the "State" subdirectory. These are automatically reloaded into MAExplorer when the .mae startup file is used to restart MAExplorer.
If your array data has JPEG or similar images of the original arrays, the should be saved in the "Images" directory. For example, the NCI-CIT mAdb database server allows you to download sampled images for your data in an "Images" subdirectory at the same time you download the other MAExplorer data files. The images can then be used by various MAEPlugin programs. If your quantified data converted to .quant files has (X,Y) coordinates corresponding to spots in these images, then you may be able to use the Montage MAEPlugin to show where the current spots are in sub-regions of all of the input images. This plugin will be available on the MAEPlugin Web site when we release the MAEPlugin facility for Beta-testing.
For a specific database (db), make sure the names of the configuration files in /Config directory are entered in the MaExplorerConfig-db.txt file for that database. You may have multiple databases in the same /Config, /Quant and /MAE directories if the file names do not conflict. The trick is to have the .mae startup file in the /MAE directory point to the specific configFile to be used. Since MAExplorer reads the MaExplorerConfig-db.txt file when it first starts up, it discovers the names of the other database files. If there is no name conflicts, then there is no problem mixing data.
Each spot data (.quant) sample file has a name which must be entered in the Database_File field of the Samples-db.txt row entry for a new sample. The Sample_ID field is a descriptive name of that sample.
Often GIPO files supplied by array vendors have additional fields not currently used by MAExplorer. You can leave them in (they will be ignored) or take them out (loading a database is faster).
If the field headings in the various user's tables are not the same as that required by MAExplorer, you can easily fix this by adding (Table,Field) mapping entries to your version of the MaExplorerConfig-db.txt file (see mapTF for examples).
Note that the optional Menu_Source_Name entry in the Samples-db.txt file specifies the sub-menu, if any, that the sample will appear in the Samples menu By Source sub-menu.
If the optional extra sample information file is used, then make sure the sample names and database file names are the same, and that there are corresponding rows in each table.
A typical sample database table might look like:
Sample_ID Project Database_File control 1 breastCancer control1 control 2 breastCancer control2 control 3 breastCancer control3 tumor 1 breastCancer tumor1 tumor 2 breastCancer tumor2 tumor 3 breastCancer tumor3 |
You may optionally include a Database_ID field. For example:
Sample_ID Project Database_File Database_ID control 1 breastCancer control1 270314 control 2 breastCancer control2 270315 control 3 breastCancer control3 270316 tumor 1 breastCancer tumor1 270317 tumor 2 breastCancer tumor2 270318 tumor 3 breastCancer tumor3 270319 |
The Database_ID may be useful if there are file length problems on some systems (i.e. MacOS 8-9), we offer the option of using the Database_ID as the file name for the .quant (Quant/ directory) and .jpg (Images/ directory) rather than the Database_File name. For example one could specify "Quant/270314.quant" and "/Images/270314.quant" rather than the default "Quant/control1.quant" and "/Images/control1.quant" names.
The Samples database table includes some required as well as optional fields (see Table C.2.1.1):
Table C.2.1 List of Samples data file table fields. The Samples table lists hybridized samples that are accessible to the user and may be loaded into a database session if they wish. (See Section C.1.1 for option notation.)
Field | Description |
---|---|
(req) Sample_ID | descriptive name of the sample, free text. [Note: an older depricated name is "Membrane_ID"] |
(req) Project | that the sample belongs. Used for login protection and grouping of samples |
(req) Database_File | name of the .quant spot database file, no spaces. This is the file name for the sample. |
(opt) DatabaseFileID | database file ID corresponding to Database_File and Sample_ID. For use with RDBMS Web databases (e.g. experiment id #). NOTE: if you are encoding auxillary data files using this identifier, e.g. sampled array images in the Images/ directory, then this field is required if you want to access those images. |
Table C.2.1.1 List of optional Samples data file table fields. These fields may be used for some additional operations. If they are not in the Samples DB table, then the operations will not be available. (See Section C.1.1 for option notation.)
(opt) Menu_Source_Name | Sample SubMenu j that this sample belongs. You could use the word "Default" or leave out this entry if you do not want to use sub menus. |
(opt)Orig_File_Name | if applicable. The original file name and sample name if the data was split out from a multiple hybridized sample file. |
(opt)Strain | if applicable |
(opt) Source | if applicable |
(opt) Probe | if applicable |
(opt) Stage | if applicable (eg, developmental stage, dose, time point, etc) |
(opt) Login | (optional) TRUE if login required with a Web server else blank. This is used primarily with the Applet when interacting with a Web server |
(opt) GeneCard_URL | GeneCard ID if applicable |
(opt) Histology_URL | (e.g. MGAP) histology DB Web page if applicable |
(opt) Model_URL | (e.g. MGAP) mouse model database Web page if applicable |
(opt) BGLow | global low value of array background intensity |
(opt) BGAvg | global average value of array background intensity |
(opt) BGRms | global root-mean-square value of array background intensity |
Table C.2.1.2 List of optional Samples data file table fields. These fields are not currently used in any computations but are returned in the Sample Array report in Section 2.4.6.1.
(opt) Contributor | name of researcher submitting the sample |
(opt) Contrib_Institute | researcher's organization |
(opt) Submission_Date | when submitted |
(opt) Exposure | minutes or hours of radiolabel or fluorescent exposure |
(opt) Sample_Nbr | internal sample number |
(opt) FilterType | name of the array layout |
(opt) FilterType_Description | additional description of array layout |
(opt) Comments | details describing sample |
(opt) Researcher | researcher performing the hybridization |
(opt) SampleGrid | serial number of the array or grid or internal laboratory numbering. (Useful if reusing arrays etc) |
Some examples of typical quantified spot data files might look like:
|
The basic Quant spot data file table includes entries listed in Table C.3.1:
Table C.3.1 List of Quant data file table fields. This specifies the spot quantification data. There may be one or more spots, corresponding to the same gene, on each row. (See Section C.1.1 for option notation.)
Field | Description |
---|---|
(opt) field | field for duplicate genes if using single 'RawIntensity' value/Row |
(req) grid | grid name (either A,B,C,... or 1,2,3,... ) |
(req) grid col | column with in a grid |
(req) grid row | row within a grid |
(opt+alt) NAME_GRC | (alternative specification of "grid, grid col, grid row"). |
(req) RawIntensity1 | intensity value for field 1. Use this form if there is more than 1 intensity value/row. |
(req) RawIntensity2 | intensity value for field 2 (required if it exists and for Cy3, Cy5 data) |
(req+alt) RawIntensity | intensity value for field 1, if only one field used |
(opt) Background1 | background intensity value for field 1 |
(opt) Background2 | background intensity value for field 2 (if it exists for F1,F2 data or Cy3, Cy5 data) |
(opt+alt) Background | background intensity value for field 1, if only one field used |
(opt) QualCheck | quality check for data indicating "bad" spots or genes. Current codes are listed in the Table C.4.2 of QualCheck semantics |
(opt) DetValue | spot data detection value quality. This could be the Affymetrix MAS5.0 "Detection p-value" or some other metric correlated with spot detection quality in the range of [0.0 : 1.0]. metrix |
Note: If NAME_GRC is specified (eg. for use with ImageQuant-NT data), then the explicit (grid, grow row, grid col) fields are not required. Note: For [G grids, R rows and C columns], this would cover a set of spots in the range [1,1,1] through [G,R,C].
Note: If Cy3/Cy5 double fluorescent labeling is used, then the RawIntensity1 and RawIntensity2 fields may be replaced with Cy3RI and Cy5RI names and the (RawIntensity1, RawIntensity2) fields mapped to (Cy3RI, Cy5RI) in the configuration file mapTF entries (table C.5.4 below). (See Section C.1.1 for option notation.)
Field | Description |
---|---|
(req) Cy3RI | RawIntensity1 value for Cy3 |
(req) Cy5RI | RawIntensity2 value for Cy5 |
(opt) Cy3Bkgrd | Background1 value for Cy3 |
(opt) Cy5Bkgrd | Background2 value for Cy5 |
(opt) Cy3 | RawIntensity1 value for Cy3 |
(opt) Cy5 | RawIntensity2 value for Cy5 |
Data is extracted from a table created from the gene-in-plate-order (GIPO) gene coordinate table. This links spots in a microarray to these Genomic "gene ID"s and gene names. This table may contain Clone ID, GenBank, dbEST, UniGene IDs, LocusID corresponding to these Master Gene IDs. An optional table of Clone IDs and Gene Classes the gene belongs to may also be defined.
A typical GIPO database table might look like:
Location grid grid col grid row plate plate row plate col Clone ID GenBankAcc GeneName . . . 39 A 2 15 2 1 3 1247601 AA763423 "Mus musculus A kinase anchor protein (AKAP-KL) mRNA, alternatively spliced isoform 1, complete cds" 40 A 2 16 2 1 4 1247553 AA763380 Mus musculus bodenin gene 41 A 2 17 2 1 5 1247865 AI465019 "Mouse beta-D-galactosidase fusion protein mRNA, complete cds" . . . |
The basic GIPO table includes the following fields:
Field | Description |
---|---|
(opt) field | array field for duplicate genes |
grid | array grid name (either A,B,C,... or 1,2,3,... ) |
grid col | array column within a grid (either A,B,C,... or 1,2,3,... ) |
grid row | array row within a grid (either A,B,C,... or 1,2,3,... ) |
(opt+alt) NAME_GRC | alternative specification to "grid, grid col, grid row". It is generated by the Molecular Dynamics spot quantification software. |
(opt) Master Gene ID | This is the master gene identifier used in MAExplorer. It must be one or more of the identifiers listed in Table C.4.3. One of these will be selected as the Master Gene ID (MID) |
(req) Gene Name | Master Gene Name. The GeneName options are listed in Table C.4.1. These alternative GeneClasses are automatically recognized from the Gene Name. |
(opt) plate | plate name for original gene. If this is not specified, it uses the grid value. |
(opt) plate row | plate row name for original gene. If this is not specified, it uses the grid row value. |
(opt) plate col | plate column name for original gene. If this is not specified, it uses the grid col value. |
(opt) QualCheck | quality check for data indicating "bad" spots or genes. Current codes are listed in the Table C.4.2 below |
Field | Description |
---|---|
(opt) GeneName | Gene name |
(opt) Unigene cluster Name | alternative for GeneName if the latter is not specified. |
GRID- grid#-Rrow#Ccol#
For example, if grid #, row# and column# are (8,12,11), then it codes it as
GRID- 8-R12C11
Status | QualCheck value | Semantics |
---|---|---|
Good gene | 2 | the spot data is "Good" (some systems report this by a NULL quality measure). It has a good gene name. Alternatively, letter codes may be used "P", "G", "T". |
Bad gene | 4 | the spot data is bad, a good gene name. |
Bad spot | 8 | is a non-analyzable spot (eg. marker, or "Bad", "Not Found", "Empty". etc.) Alternatively, letter codes may be used "A", "B", "F". |
Duplicate spot | 16 | is duplicate of another gene on array |
Marginal spot | 256 | is a marginally quantified spot. Alternatively, letter codes may be used "M". |
Field | Description |
---|---|
(opt) Location | alternate spot identifier. E.g., Affymetrix 'probe_set', or Incyte 'IncyteID', etc. This may be numeric or alphanumeric |
(opt) Clone ID | I.M.A.G.E. consortium database clone ID. It may have a "IMAGE:" or "ATCC:" prefix |
(opt) Unigene cluster ID | NCBI UniGene database ID |
(opt) dbEST3' | NCBI dbEST database |
(opt) dbEST5' | NCBI dbEST database |
(opt) GenBankId | NCBI GenBank database |
(opt) GenBankId3' | NCBI GenBank database |
(opt) GenBankId5' | NCBI GenBank database |
(opt) RefSeqID | NCBI RefSeq database |
(opt) LocusID | NCBI LocusLink database |
(opt) OMIMID | NCBI OMIM database |
(opt) SwissProtID | Swiss-Prot database |
Parameter | Value | DataType | Comments |
---|---|---|---|
(opt) GenomicMenu1 | GenBank | String | Name of the database. This will appear in the View menu |
(opt) GenomicURL1 | http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?db=2&form=1&term= | String | URL to which one adds the 'GenomicIDreq' value |
(opt) GenomicURLepilogue1 | String | epilogue of the URL if any | |
(opt) GenomicIDreq1 | GBID | String | Name of the GenomicID required and that is specified in the GIPO file as one of its fields |
(opt) GenomicMenu2 | UniGene | String | Name of the database. This will appear in the View menu |
(opt) GenomicURL2 | http://www.ncbi.nlm.nih.gov/UniGene/query.cgi?ORG=Mm&CID= | String | URL to which one adds the 'GenomicIDreq' value |
(opt) GenomicURLepilogue2 | String | epilogue of the URL if any | |
(opt) GenomicIDreq2 | UID | String | Name of the GenomicID required and that is specified in the GIPO file as one of its fields |
We are developing tools for creating and editing the configuration file. In the mean time, edit the file with Excel and save the finished table as a tab-delimited text file with the name MaExplorerConfig.txt in the Config sub-directory) in the directory where your database is stored.
Table C.5 List of Configuration data file table fields.
Parameter subset | Function of these parameters |
---|---|
1. Array content & geometry | Describes the content and geometry of the arrays (required) |
2. Threshold defaults | Describes the threshold defaults (optional) |
3. Array database files | Describes the array specific database files (required) |
4. Table field mapping | Describes "mapTF" table,field mapping. This maps user defined names to names required by MAExplorer and is only required if the user names are different from the names MAExplorer expects. |
5. URL genomic databases | Describes base addresses of genomic Web DBs (optional). If you do not specify these, default values are supplied from the program. |
6. User menus | Describes user-specific menus (optional) |
The following sub-tables list the configuration parameters and some typical values that might be included. These examples illustrate the variety of parameter options with examples of values that might be used. Required entries are listed at the tops of the tables.
A typical MAExplorer minimal configuration database table might look like:
Parameter Value DataType Comments MAX_FIELDS 1 int # replicate grids/array MAX_GRIDS 2 int # grids/field MAX_GRID_COLS 38 int # columns/grid MAX_GRID_ROWS 27 int # rows/grid usePseudoXYcoords true boolean use pseudoarray XY coord image - no XY data gipoFile GIPO.txt File name of GIPO file from samplesDBfile SamplesDB.txt File name of Samples DB file dataBase demo String default name of project database dbSubset demo1 String default database subset name useRatioData true boolean treat duplicate(F1,F2) data as ratio (F1/F2) - i.e.Cy3/Cy5 EditDate Tue Aug 21 2000 String demo |
Parameter | Value | DataType | Comments |
---|---|---|---|
(req) MAX_FIELDS | 2 | int | # duplicate grids (blocks, patch, etc.) of spots for each gene in the array (i.e. F1, F2, etc.). Note that Cy3 and Cy5 data for each spot count as one field. |
(req) MAX_GRID_COLS | 24 | int | # cols/grid in the array |
(req) MAX_GRID_ROWS | 9 | int | # rows/grid in the array |
(req) MAX_GRIDS | 8 | int | # grids in the array |
(opt) ignoreExtraFields | FALSE | boolean | if there are additional fields of data in the GIPO or .quant files, then ignore them. Only use the first rawIntensity field. Note: this option is not normally used. |
(opt) reuseXYcoords | FALSE | boolean | Reuse XY coordinates from first sample for rest of the samples |
(opt) SpotRadius | 7 | int | (2 to 20 pixels) 50 microns, scroller. Note: this should be set to about 4 or 5 for a 10000 gene DB. |
(opt) swapRowsColumns | FALSE | boolean | set if swap rows and columns in the array (used with our particular Research Genetics arrays) |
(opt) usePseudoXYcoords | FALSE | boolean | use pseudoarray XY coordinates image if there is no explicit no XY spot position data generated by the quantification software |
(future) FIELD_LAYOUT | LtoR | String | fields are Left to Right |
(future) FIELDS_ARE_NUMBERED | TRUE | boolean | Data files contain field number. Otherwise field is extrapolated |
(future) GRID_LAYOUT | Horizontal | String | Grids are Left To Right in the array |
(future) GRID_PER_ROW | 4 | int | # grids per row in each field of the array |
Parameter | Value | DataType | Comments |
---|---|---|---|
(ratio) fluorescentLbl1 | Cy3 | String | name of dye for fluorescent label 1 |
(ratio) fluorescentLbl2 | Cy5 | String | name of dye for fluorescent label 2 |
(ratio) useRatioData | TRUE | boolean | set if data is Cy3/Cy5 ratio data otherwise it assumes intensity data for each spot |
(opt+ratio) useRatioMedianCorrection | FALSE | boolean | when using ratio data mode (Cy3/Cy5), use ratio median correction as the default |
(opt) useBackgroundCorrection | FALSE | boolean | use background correction as the default when startup |
(future) useCy5/Cy3 | FALSE | boolean | compute Cy5/Cy3 ratios instead of Cy3/Cy5 ratios |
Parameter | Value | DataType | Comments |
---|---|---|---|
(opt) calibDNAname | mouse genomic DNA | String | name for calibration DNA if available - replacing cloneID in the case where the clones are not yet in the I.M.A.G.E. database. The particular clone is located using the Plate(grid,row,col) reported when selecting the current gene. |
(opt) classNameX | HP-X 'set' | String | default name of HP-X samples 'set' |
(opt) classNameY | HP-Y 'set' | String | default name of HP-Y samples 'set' |
(opt) dataBase | MGAP DB | String | name of the database project |
(opt) dbSubset | Preg 13 vs Lact 1 | String | name of the subset of data from the database |
(opt) geoPlatformID | GPL80 | String | name of the NCBI Gene Expression Omnibus (GEO) Platform Id |
(opt) maAnalysisProgram | Research Genetics Pathways 2.01 | String | name of spot quantification program |
(opt) yourPlateName | your plate | String | name of researcher's clones if available - used in the cloneID data field in the case where the clones are not yet in the I.M.A.G.E. database. The particular clone is located using the Plate(grid,row,col) reported when selecting the current gene. (See Table 2.4.1) |
(opt) emptyWellName | empty wells | String | what you called empty wells if there are any in the database. (See Table 2.4.1) |
(opt) EditDate | 06-19-00, Lemkin | String | comment why changed |
Parameter | Value | DataType | Comments |
---|---|---|---|
(opt) gangSpotFlag | TRUE | boolean | set gang spot display on startup for database with duplicate spots |
(opt) presentationViewFlag | FALSE | boolean | start MAExplorer with larger fonts and graphics symbols suitable for live presentations |
(opt) showEGLflag | FALSE | boolean | show EGL genes on startup from previously saved database that had EGL genes selected. |
(opt) showMouseOver | TRUE | boolean | show mouse-over info when move mouse in windows |
(opt) useDichromasy | FALSE | boolean | use orange-blue else use red-green color scheme |
(opt) viewFilteredSpotsFlag | TRUE | boolean | view Filtered spots the array pseudoimage. If it is off, it shows just the pseudoarray image without spots passing the filter or MAExplorer state information. |
Parameter | Value | DataType | Comments |
---|---|---|---|
(opt) CanvasHorSize | 1100 | int | pixels, horizontal size of microarray image **DEPRICATED** |
(opt) CanvasVertSize | 1100 | int | pixels, vertical size of microarray image **DEPRICATED** |
(opt) fontFamily | SansSerif | String | default text font family. See Font Family for other fonts. Some fonts look better with some operating systems. |
(opt) clusterDistThr | 10 | float | default cluster similarity threshold in [0.0 : 100.0], scroller |
(opt) maxGenesReported | 50 | int | max # of genes in highest/lowest gene report |
(opt) maxPreloadImages | 4 | int | max # HP samples to initially load |
(opt) nbrOfClustersThr | 6 | int | default # clusters for K-means clustering |
(opt) pValueThr | 0.2 | float | default p-value for statistical tests |
(opt) spotCVthr | 0.25 | float | default spot Coefficient of Variation value |
(opt) allowNegQuantDataFlag | FALSE | boolean | set if .quant file data has negative intensity values otherwise it clips the negative values to 0.0 |
(opt) usePosQuantDataFlag | TRUE | boolean | Filter out genes where .quant file data has negative intensity values otherwise it uses the negative data |
Parameter | Value | DataType | Comments |
---|---|---|---|
(req) gipoFile | GIPO-DB.txt | File | Composite Gene-In-Plate-Order (GIPO) file containing the spot print order, Clone-IDs, gene names, GenBank IDs, plate coordinates, etc. (See Appendix C.4) |
(req) samplesDBfile | Samples-DB.txt | File | list of hybridized samples in the database. [Note: an older depricated name was "membranesDBfile"]. (See Appendix C.2) |
(opt) quantFileExt | .quant | String | alternate quantification spot file name extension to use instead of ".quant". (You might set it to ".txt") (See Appendix C.3) |
[TableName],[MAE field name],[TableName],[User field name]
The following table fields may be mapped. Note: mapping is required only when the table field names of your data files are different than the internal MAExplorer table field names.
The following is an example of some of the parameters that might be added to the Configuration file to perform field name mappings. Note: these mappings are only required if the data field names are non-standard. This shows some typical field name mappings. It will not be the same for your data. (See Section C.1.1 for option notation.)
Parameter | Value | DataType | Comments |
---|---|---|---|
(opt) mapTF | GipoTable,grid,GipoTable,SA | String | GIPO table grid name (numbers or letters) |
(opt) mapTF | GipoTable,grid row,GipoTable,R | String | GIPO table row of grid name (numbers or letters) |
(opt) mapTF | GipoTable,grid col,GipoTable,C | String | GIPO table column of grid name (numbers or letters) |
(opt) mapTF | GipoTable,plate,GipoTable,RG Pl | String | GIPO table plate where clone came from |
(opt) mapTF | GipoTable,plate row,GipoTable,RG row | String | GIPO table row of plate where clone came from |
(opt) mapTF | GipoTable,plate col,GipoTable,RG col | String | GIPO table column of plate where clone came from |
(opt) mapTF | GipoTable,Clone ID,GipoTable,Clone id | String | GIPO name of Clone ID |
(opt) mapTF | GipoTable,GeneName,GipoTable,Gene name | String | GIPO table map gene name |
(opt) mapTF | GipoTable,Unigene cluster ID,GipoTable,ucid | String | GIPO table UniGene cluster id (if available) |
(opt) mapTF | Unigene cluster name,GipoTable,ucn | String | GIPO table UniGene cluster name (if available) |
(opt) mapTF | GipoTable,GenBank 3',GipoTable,gb3' | String | GIPO table GenBank 3' id (if available) |
(opt) mapTF | GipoTable,GenBank 5',GipoTable,gb5' | String | GIPO table GenBank 5' id (if available) |
(opt) mapTF | GipoTable,dbEST 3',GipoTable,est3' | String | GIPO table dbEST 3' id (if available) |
(opt) mapTF | GipoTable,dbEST 5',GipoTable,est5' | String | GIPO table dbEST 5' id (if available) |
(opt) mapTF | QuantTable,grid,QuantTable,SA | String | Quant table array grid name (numbers or letters) |
(opt) mapTF | QuantTable,grid row,QuantTable,R | String | Quant table row of grid name (numbers or letters) |
(opt) mapTF | QuantTable,grid col,QuantTable,C | String | Quant table column of grid name (numbers or letters) |
(opt) mapTF | QuantTable,RawIntensity,QuantTable,Intensity | String | Quant table RawIntensity data |
(opt) mapTF | QuantTable,Background,QuantTable,BkgrdIntens | String | Quant table background intensity |
(opt) mapTF | QuantTable,RawIntensity1,QuantTable,Cy3RI | String | Quant table RawIntensity1 Cy3 data |
(opt) mapTF | QuantTable,RawIntensity2,QuantTable,Cy5RI | String | Quant table RawIntensity2 Cy5 data |
(opt) mapTF | QuantTable,Background1,QuantTable,BkgrdCy3RI | String | Quant table background intensity for Cy3 |
(opt) mapTF | QuantTable,Background2,QuantTable,BkgrdCy5RI | String | Quant table background intensity for Cy5 |
Parameter | Value | DataType | Comments |
---|---|---|---|
(opt) dbEstURL | http://www.ncbi.nlm.nih.gov/irx/cgi-bin/birx_doc? dbest+ |
String | NCBI dbEst server by dbEST ID. You may use an alternative server. |
(opt) GenBankAccURL | http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Search&db=Nucleotide&term= |
String | NCBI GenBank server by GenBankAcc ID. You may use an alternative server. |
(opt) GenBankCloneURL | http://www.ncbi.nlm.nih.gov/irx/cgi-bin/submit_form_query? TITLE=dbEST+Retrieval+Output&INPUTS=1& BRACKETS=NONE&ADDFLAGS=-b&DB=dbest& NDOCS=10&Q1= |
String | NCBI GenBank entry by Clone_ID server. You may use an alternative server. |
(opt) GenBankCloneURLepilogue | [clin] | String | Epilog added after Clone_ID. You may use an alternative server. |
(opt) IMAGE2GenBankURL | http://nciarray.nci.nih.gov/cgi-bin/UG_query.cgi? ORG=Mm&ACC=IMAGE: |
String | lookup GenBank from CloneID server. You may use an alternative Image to GenBank server. The "ORG=Mm" should be changed to reflect the proper species, eg. "ORG=Hs" for human, etc. |
(opt) IMAGE2GIDURL | http://nciarray.nci.nih.gov/cgi-bin/UG_query.cgi? ORG=Mm&GID=IMAGE: |
String | NCI/CIT lookup GenBank GID from CloneID server. You may use an alternative CloneID to GenBank GID server. The "ORG=Mm" should be changed to reflect the proper species, eg. "ORG=Hs" for human, etc. |
(opt) IMAGE2unigeneURL | http://nciarray.nci.nih.gov/cgi-bin/UG_query.cgi? ORG=Mm&CLONE=IMAGE: |
String | NCI/CIT lookup UNIGENE from CloneID server. You may use an alternative CloneID to UniGene server. The "ORG=Mm" should be changed to reflect the proper species, eg. "ORG=Hs" for human, etc. |
(opt) unigeneURL | http://www.ncbi.nlm.nih.gov/UniGene/clust.cgi? ORG=Hs&CID= |
String | NCBI UNIGENE by Clone ID server. You may use an alternative UniGene server. The "ORG=Hs" should be changed to reflect the proper species, eg. "ORG=Mm" for mouse, etc. |
(opt) locusLinkURL | http://www.ncbi.nlm.nih.gov/LocusLink/list.cgi? SITE=104&V=1&ORG=Hs&ORG=Mm&ORG=Rn&ORG=Dr&ORG=Dm&Q= |
String | NCBI LocusLink by GenBank ID server. The LocusLink server is accessed by LocusID |
gbid2LocusLinkURL | http://www.ncbi.nlm.nih.gov/LocusLink/list.cgi?SITE=104 &V=1&ORG=Hs&ORG=Mm&ORG=Rn&ORG=Dr&ORG=Dm&Q= |
String | NCBI LocusLink by LocusID server. The LocusLink server is accessed by LocusID |
(opt) swissProtURL | http://www.expasy.ch/cgi-bin/get-sprot-entry? | String | SwissProt by SwissProt ID |
(opt) omimURL | http://www.ncbi.nlm.nih.gov:80/entrez/dispomim.cgi?id= | String | NCBI OMIM database by OMIM ID |
(opt) pirURL | http://pir.georgetown.edu/cgi-bin/iproclass/iproclass?choice=entry&id= | String | PIR ProClass database by SwissProt ID |
(opt) GeneCardURL | http://bioinfo.weizmann.ac.il/cards-bin/carddisp? | String | GeneCard DB server. You may use an alternative server. |
(opt) histologyURL | http://mammary.nih.gov/models/ | String | E.g NIDDK MGAP histology DB server. If you have an alternative histology model server, put it here. |
(opt) modelsURL | http://mammary.nih.gov/models/ | String | e.g. NIDDK MGAP mouse models DB server. You may use an alternative models server. |
(opt) proxyServer | http://www.lecb.ncifcrf.gov/cgi-bin/maeProxySvr? | String | NCI/LECB proxy server to access servers outside of the Java "sandbox". If you set up MAExplorer on your local server, then] this should point to a proxy server on your system. |
Parameter | Value | DataType | Comments |
---|---|---|---|
(opt) HelpMenu1 | List of hybridized samples | String | Help sub menu URL |
(opt) HelpMenu2 | MGAP animal models | String | Help sub menu URL |
(opt) HelpMenu3 | MGAP home page | String | Help sub menu URL |
(opt) HelpURL1 | http://www.lecb.ncifcrf.gov/mae/maeHybridizations.html | String | Help sub menu URL |
(opt) HelpURL2 | http://mammary.nih.gov/models/ | String | Help sub menu URL |
(opt) HelpURL3 | http://mammary.nih.gov/ | String | Help sub menu URL |
Parameter | Value | DataType | Comments |
---|---|---|---|
(opt) PluginMenuName1 | New Cluster plot | String | Plugin sub menu string |
(opt) PluginMenuStubName1 | PlotMenu:cluster | String | name of Plugin menu stub to add menu entry |
(opt) PluginClassFile1 | NewClusterPlot.jar | String | Name of class file |
(opt)sPluginCallAtStartup1 | InstallInMenu | String | handling plugins at startup: "InstallInMenu", "RunOnStartup", "NoInstall" |
(opt) PluginMenuName2 | New sample report | String | Plugin sub menu string |
(opt) PluginMenuStubName2 | ReportMenu:sample | String | name of Plugin menu stub to add menu entry |
(opt) PluginClassFile2 | NewSampleReport.jar | String | Name of class file |
(opt)sPluginCallAtStartup2 | InstallInMenu | String | handling plugins at startup: "InstallInMenu", "RunOnStartup", "NoInstall" |
(opt) PluginMenuName3 | Client-server | String | Plugin sub menu string |
(opt) PluginMenuStubName2 | -none- | String | name of Plugin menu stub to add menu entry |
(opt) PluginClassFile2 | ClineServerMAE.class | String | Name of class file |
(opt)sPluginCallAtStartup2 | InstallInMenu | String | handling plugins at startup: "InstallInMenu", "RunOnStartup", "NoInstall" |
Therefore we have created a Java conversion tool called Cvt2Mae to automate these
conversions. You may
and install Cvt2Mae on your computer and use it to
convert your array data to MAExplorer data format. Figure C.6.1 shows
Cvt2Mae array data converter.
Cvt2Mae is a "Wizard" driven process designed for use by molecular biologists. It handles commercial chips such as Incyte, Affymetrix, GenePix, Scanalyze, etc. or one-of-a-kind academic chips. It asks you questions to describe your chip and your data. We call the chip description the "Array Layout". After you have created or edited an array layout, you may save it for use in future conversions. [The array layouts are kept in a subdirectory "ArrayLayout" in the directory where you installed Cvt2Mae.] Since an ArrayLayout is a file, you could mail it to a collaborator. After you have answered the questions, you then run the converter and it generates the proper set of converted data files. In the case of user defined array layouts, we denote the latter as <User-defined> where the user assigns a name to that layout as part of the description. Essentially, the array layout contains a set of "rules" for describing the user's array data so Cvt2Mae knows how to read it. At some point, we plan to add the MAGE-ML standard to Cvt2Mae as one of the array layouts so it should be able to handle a wider variety of data.
Figure C.6.1 The Cvt2Mae array data converter. Selecting a Chipset Array Layout. The built-in array layouts are shown for the various chip types. User-defined layouts may be added by selecting the <User-defined> layout then editing the layout using the Edit Layout, Assign GIPO fields, and Assign Quant fields. These options are described in more detail in the Cvt2Mae home page..
|
The details on Cvt2Mae including more description, PDF examples of conversions
for several different types of arrays, the download area, status of the converter,
etc. are available on the Cvt2Mae home page
OPT_GRID_SIZE = 1200; /* Optimal grid size for MAExplorer viewing */ ROWS_TO_COLS_ASPECT_RATIO = 3.0/4.0; /* desired rows/cols aspect aspect for a grid */ extra = 0; /* # of extra grid cols required */ /* Estimate # of grids. Assume a square aspect ratio */ if(n <= OPT_GRID_SIZE) nGrids = 1; else nGrids = (n / OPT_GRID_SIZE)+1; /* Estimate rows (r) and columns (c) from a rectangular grid * where cols = (4/3) rows. * Then, c = (4/3)r and r*c= area. * Then (4/3)*r*r = area or * r = sqrt((3/4)*area). */ if(nRowsExpected > 0) while(true) { /* iterate to optimal size */ gridSize = n/nGrids; nGridRows = sqrt( ROWS_TO_COLS_ASPECT_RATIO * gridSize ); nGridCols = (nGridRows / ROWS_TO_COLS_ASPECT_RATIO); nGridCols += extra; estTotSize = (nGrids * nGridRows * nGridCols); if(estTotSize > nRowsExpected) break; else extra++; /* keep trying until meet criteria */ } /* iterate to optimal size */
This section discusses the installation of MAExplorer as a stand-alone application on a variety of computers. Since Java is portable between Microsoft Windows (95/98/NT/2000/XP), Macintoshes, Linux, Solaris, etc., it is possible to freely download and install MAExplorer and Cvt2Mae on your computer and run it as an application program.
There is a discussion on using it with other arrays (Appendix C) that requires editing data files for use with MAExplorer. An array data conversion tool is being constructed which will automate this process in the future.
Figure D.1 Web page showing options for installing MAExplorer as a stand-alone application. Installers are available for Windows95/98/NT/2000/XP, Mac OS, Solaris, Linux, Unix, and other Java enabled platforms.
You first need to download MAExplorer for your particular type of operating system. These include Windows 95/98/NT/2000/XP, MacOS for Power PC, Sun Solaris, HP-UX, other Unix versions (e.g. Linux, etc). The Windows, MacOS and Solaris versions include a Java run-time (Java Virtual Machine) that works with MAExplorer. We recommend you download the full distribution for your computer (which includes a recent Java virtual machine (JVM) if it exists). This insures proper operation of MAExplorer and does not interfere with other Java applications you might have installed or will install.
This installation process uses a commercial "Java Installer" (InstallAnywhere(TM) by ZeroG Inc.) that requests you "Grant" it permission to save the installation on your computer. It will suggest where to install it or you can install it wherever you want. For example, in Windows it may suggest saving it in C:\Program Files\MAExplorer\ - you can specify an alternative directory if the default disk does not have much free space left.
It may involve more than a single patch. It is the latest Recommended
Patch Cluster from Sun. We STRONGLY recommend having
your System Admininistrator do this for you if you have not done this
before. Point your Web browser to:
http://sunsolve.Sun.COM/pub-cgi/show.pl?target=patches/patch-accessand choose the appropriate patch set for the version of Solaris (2.6, 7, or 8, etc.) that you are running. Do not choose any of the x86 versions unless you are running Solaris x86. Click on either the Download HTTP option or Download FTP option, and click the GO button to download the patch set. |
Parametery | Value | DataType | Comments |
---|---|---|---|
(opt) enableFIOcaching | TRUE | boolean | enable caching data files from Web server on local compute |
(opt) saCodeBase | http://www.lecb.ncifcrf.gov/mae/ | String | Web database to use to get the data |
(opt) useWebDB | TRUE | boolean | set get data from a Web database |
(installation path)/MAExplorer MAE/(some startup file).mae
You can let your Unix system find MAExplorer by putting it in your path variable in your login or shell startup script.
set path = ($path <MAExplorer installation path>)
Then, you would start it by specifying the startup file residing in the MAE/ subdirectory as:
MAExplorer MAE/(some startup file).mae
There is a set of sample MGAP .mae files in the MAE subdirectory in the downloaded installation.
The .mae startup files are simply tab-delimited ASCII files with a .mae file extension. These could be created or edited either manually (e.g. using Microsoft Excel and saving the file as a tab-delimited file) or by various database programs (eg. the NCI/CIT mAdb program, the MAExplorer Cvt2Mae program being developed). They may also be generated by MAExplorer (File:Database:Save as file DB). The .mae file form consists of two tab-delimited columns containing fields Name and Value. These field names appear in the first row. This is followed by instances of the various parameters. A simple .mae file is shown in the following table Table D.4 using a 4 sample Lactation database subset from the MGAP database. Although any of the configuration file values can be specified in the .mae file, we list some of the more common optional parameters are indicated in Table D.4.1.
Name | Value |
---|---|
image1 | C57B6-L1-30min |
image2 | C57B6-L3-1hr |
image3 | C57B6-L10-29hrs-1 |
image4 | Stat5a.--.L1-30min |
Table D.4.1 Some of the common optional entries for .mae startup files. These entries are shown with example values for the 4 samples in Table D.4. See (Appendix C.5 for lists of many other options.
Name | Value | DataType | Comments |
---|---|---|---|
(opt) maxPreloadImages | 4 | int | overide the number of samples (called images) to actually load. This may be less than the number of image entries. |
(opt) configFile | MaExplorerConfig-MGAP.txt | String | name of Configuration file if not MaExplorerConfig.txt |
(opt) dataBase | MGAP DB | String | name of this specific database |
(opt) dbSubset | Pregnancy 13 days: C57BL/6 vs. stat5a (-,-), 8 samples | String | title for this subset name of the database |
(opt) Xlist | 1,2,3 | String | hybridized samples for initial HP-X 'set'. Corresponding to image1, image2, etc. Empty if not defined - may be defined using the Choose HP-X(Y,E) in the File menu. |
(opt) Ylist | 4 | String | hybridized samples for initial HP-Y 'set'. Corresponding to image1, image2, etc. Empty if not defined - may be defined using the Choose HP-X(Y,E) in the File menu. |
(opt) Elist | 1,2,3,4 | String | hybridized samples for initial HP-E 'list'. Corresponding to image1, image2, etc. Empty if not defined - may be defined using the Choose HP-X(Y,E) in the File menu. |
(opt) classNameX | C57B6 lactation (days 1,3,10) | String | Experimental class name for the HP-X 'set' of hybridized samples |
(opt) classNameY | Stat5a (-,-) lactation day 1 | String | Experimental class name for the HP-Y 'set' of hybridized samples |
(opt) noMsgReporting | TRUE | boolean | If set TRUE, used with Applet only to not send loading status message. |
(opt) reuseXYcoords | FALSE | boolean | If set TRUE and the quantified data files have the (x,y) coordinates for each spot, then use the same coordinates for all subsequent data files so that the arrays can be superimposed (for Flickering two HPs). |
(opt) usePseudoXYcoords | FALSE | boolean | If set TRUE, force MAExplorer to generate pseudoarray (X,Y) spot coordinates and ignore (X,Y) data in the quantified spot files if it exists. This will be set to TRUE automatically if there are no (X,Y) data fields in the quantified spot files. |
The following is a simple example of HTML code containing an applet which will invoke MAExplorer. You may add other options with PARAMs in the Applet (or for that matter in the the .mae startup file) that overide any options normally specifed in the Configuration file (See Appendix C.5).
<HTML> <HEAD> <TITLE>MAExplorer Startup: C57B6 Pregnancy vs Lactation</TITLE> </HEAD> <BODY> <H2>MAExplorer Startup: C57B6 Pregnancy vs Lactation</H2> This startup database will start the MAExplorer. It contains a subset of the database consisting of four C57B6 mammary development hybridized samples (HP): two each for pregnancy and lactation. <APPLET CODE=MAExplorer.class ARCHIVE=MAExplorer.jar WIDTH=10 HEIGHT=10 ALIGN=absmiddle> <PARAM NAME=configFile VALUE=MaExplorerConfig-MGAP.txt> <PARAM NAME=dbSubset VALUE="C57B6 pregnancy vs lactation"> <PARAM NAME=image1 VALUE=C57B6-p13.1> <PARAM NAME=image2 VALUE=C57B6-L1-30min> <PARAM NAME=image3 VALUE=C57B6-p13.2poly-A> <PARAM NAME=image4 VALUE=C57B6-L1-total> <PARAM NAME=Xlist VALUE=1,3> <PARAM NAME=Ylist VALUE=2,4> <PARAM NAME=Elist VALUE=1,3,2,4> <PARAM NAME=classNameX VALUE=Pregnancy> <PARAM NAME=classNameY VALUE=Lactation> (Sorry, you need a Java-capable browser to view this.) </APPLET> </BODY> </HTML>
.mae file name | X vs Y comparison | # of hybridizations |
---|---|---|
C57vsDevModels-15probes.mae | HP-XY is C57B6 vs developmental models | 15 |
C57vsDevModels-15probes-cache.mae | HP-XY is C57B6 vs developmental models (with cache) | 15 |
C57vsDevModels-38probes.mae | HP-XY is C57B6 vs developmental models. HP-E is all samples | 38 |
Lact1-C57vsStat5a-38probes.mae | HP-XY is C57B6 Lactation day 1 vs Stat5a (-,-). HP-E is all samples | 38 |
Lact1vs10-10probes.mae | HP-XY is C57B6 Lactation day 1 vs Lactation day 10. HP-E is all samples | 10 |
Lact1vs10-38probes.mae | HP-XY is C57B6 Lactation day 1 vs Lactation day 10. HP-E is all samples | 38 |
Lact-C57vsStat5a-5probes.mae | HP-XY is C57B6 Lactation day 1 vs Stat5a (-,-) | 5 |
Lact-C57vsStat5aCEBPnull-19probes.mae | HP-XY is C57B6 Lactation day 1 vs Stat5a (-,-) and CEBP-null, HP-E has samples of other tissues | 19 |
MAEstartupDefault.mae | none | none |
Preg13day-C57vsStat5a-19probes-cache.mae | HP-XY is C57B6 Pregnancy day 13 vs Stat5a (-,-). HP-E has samples of other tissues (with cache) | 19 |
Preg13day-C57vsStat5a-19probes.mae | HP-XY is C57B6 Pregnancy day 13 vs Stat5a (-,-). HP-E has samples of other tissues | 19 |
Preg13day-C57vsStat5a-38probes.mae | HP-XY is C57B6 Pregnancy day 13 vs Stat5a (-,-). HP-E is all samples | 38 |
Preg-C57vsStat5a-4probes.mae | HP-XY is C57B6 Pregnancy day 13 vs Stat5a (-,-) | 4 |
Preg13VsLact1-18probes.mae | HP-XY is C57B6 Pregnancy day 13 vs Lactation day 1. HP-E is all samples | 18 |
Preg13VsLact1-38probes.mae | HP-XY is C57B6 Pregnancy day 13 vs Lactation day 1. HP-E is all samples | 38 |
Preg-C57vsStat5a-8probes.mae | HP-XY is C57B6 Pregnancy day 13 vs Stat5a (-,-) | 8 |
Preg13day-Stat5aVsCEBP-null-38probes.mae | HP-XY is C57B6 Lactation day 1 vs Stat5a (-,-) and CEBP-null, HP-E is all samples | 38 |
reuseXY-Preg13day-C57vsStat5a-38probes.mae | HP-XY is C57B6 Pregnancy day 13 vs Stat5a (-,-). Use XY coords of first probe for remainder for flickering. HP-E is all samples | 38 |
reuseXY-Preg-C57vsStat5a-8probes.mae | HP-XY is C57B6 Pregnancy day 13 vs Stat5a (-,-). Use XY coords of first sample for remainder for flickering | 8 |
MGAP-50samples.mae | C57B6 day 13 preg. vs day 1 lact., 50 samples | 50 |
OCL-P13L1L10Stat5a--15probes.mae | replicates of C57B6 (pregnancy day 13, lactation days 1 and 10, and stat5a(-,-) 15 samples. The database also includes 4 additional condition sets of this data and an Ordered Condition List of the 4 conditions (in the State/ directory). This may be used to demo the OCL F-test filter. | 15 |
Another major decision was to use multiple pop-up windows for 2D plots, histograms, expression profiles, clustergrams, reports, dialog boxes, etc. rather than sharing a single window. These windows are maintained by a special pop-up registry that handles many of the bookkeeping chores involved with tracking and updating multiple windows viewing the same underlying data. Whenever an event occurs which may change the set of data filtered genes, the current gene or the current cluster set of genes, the registry is notified. Some of the events are the current clone changed, the Filter parameters changed, the sample labels changed, the normalization method changed, etc. It in turn notifies all relevant active plots, tables and reports - requesting them to update themselves if necessary. This object-oriented design greatly simplifies the process of synchronizing the various data presentations with changes in the database.
A good intersection of the server-centric and client-centric methods is to distribute the computation and data to the systems where they can be handled most effectively. Because Java enables computation in a Web browser, PCs currently available have enormous power and memory, and high-speed Internet connections are readily available, it is now possible to distribute some of the data and computations to the desktop. If high-speed direct manipulation methodology is to be made available on the Internet for microarray data mining, then it must be brought to the user's desktop browser or local computer rather than residing solely on the back-end server. This is the approach taken in designing the MAExplorer.
Table E.2 Comparison of client-centric vs. server-centric data mining. The table shows a comparison of some of the features of client-centric and server-centric (using CGI and/or Applet) data mining analysis methods. The client-centric approach presented here primarily uses Java with data downloaded to the client's computer. A server-centric approach might use a mix of HTML, CGI, servlet and Java. However, even a client-centric approach may take advantage of server support for additional functionality (e.g. accessing genomic servers to gain additional information about specific genes or sets of genes).
Approach | Advantage (+) disadvantage (-) |
Feature |
---|---|---|
Client-centric a) | + | Java programs run (pretty much) on all operating system platforms as either stand-alone or applets (in browsers) |
Client-centric b) | + | handles rapid response required for direct manipulation on the new generation of very fast desktop computers |
Client-centric c) | + | stand-alone version may be restarted quickly from local data or data cached from the Web server |
Client-centric d | + | size limitations are not a problem with stand-alone Java applications |
Client-centric e) | + | Java plug-ins allows prototyping new local and Web DB analysis method functionality by any group of users |
Client-centric f) | - | for the applet version, there is slow startup because the program and all data has to be downloaded each time it is run |
Client-centric g) | - | difficult to build large stable Web-applets handling very large data sets. However, stand-alone applications don't have this problem |
Client-centric h) | - | for the stand-alone application version, it must be installed on client's computer where there nmight be some level of incompatibility |
Approach | Advantage (+) disadvantage (-) |
Feature |
---|---|---|
Server-centric a) | + | may have better resources for very large data sets but with dependence on server |
Server-centric b) | + | faster startup than downloaded applet since minimal GUI is required and data does not have to be loaded before computation requests may be made to the server |
Server-centric c) | + | may be easier to prototype and distribute new functionality using third party software such as RDBMS, S-plus, etc. using centralized CGI or servlets where only one copy is required on the server |
Server-centric d) | - | susceptible to Internet traffic bandwidth problems for large numbers of users |
Server-centric e) | - | susceptible to server-load dependencies for large numbers of users |
Server-centric f) | - | difficult to get very rapid response for direct manipulation for data mining |
The following figures show the top level plugin design.
In support of the MGAP server, additional software was written to automate the pre-processing of the microarray quantitative data from Research Genetics' Pathways array quantification analysis program and perform compression and Web server updates for this data. The Web server also hosts several common gateway interface (CGI) programs. These include user login support, a Web proxy server (to access other genomic Web sites from the Java applet), support of login-protected user state file access, custom database creation, user state files, and "groupware" user-access support.
This data may be used for learning about MAExplorer with the tutorials and for investigating some of the stages of normal and mouse-model mammary development. The MAExplorer reference manual may be viewed in your browser from the Web from this Web site. Alternatively, you may download the full manual as a Acrobat MaeRefMan.pdf PDF file (> 5Mb).
If you have problems with the installation, then you might want to read the rest of this section and also the part of the manual which discusses installation (Appendix D) and using it with your arrays (Appendix C). The latter requires editing your data files for use with MAExplorer. The Cvt2mae is a "wizard" array data conversion tool automates this process.
If you have previously installed MAExplorer and you want to update just the MAExplorer.jar file (the actual program), you can do this as described in Section 1.3. Alternatively, you can use the new "Update MAExplorer" command in the Files menu. This will (1) backup the current MAExplorer.jar file as MAExplorer.jar.bkup; (2) copy the latest MAExplorer.jar file from the maexplorer.sourceforge.net Web site and replace your MAExplorer.jar file in your installation directory. Then when you restart MAExplorer, it will use the new version of the program.
After initially, installing MAExplorer (or the Cvt2Mae for that matter), you can simply download the latest .jar file and overwrite the previous version you had when you installed the program. The MGAP demo data can be downloaded separately.
SourceForge Download MAExplorer Installer |
---|
Figure. Web page showing options for installing MAExplorer as a stand-alone application. Installers are available for Windows95/98/NT/2000/XP, MacOS-8/9, MacOS-X, Solaris, HP-UX, Linux, Unix, and other Java enabled platforms. [Click on the figure to see a high resolution version.] NOTE: the MacOS installer is currently not available. If you have problems with the Sun installer, you may need to update your Solaris OS system patches (see below).
2. You start the download process when you click on the installer for your computer platform. (You may alternatively use the default installer discussed below.) Follow the directions it provides as you download the installer. It also provides instructions in the "View" hyperlink adjacent to the operating system you selected that tells you what to do after you finished the download. Part of the installation consists of telling the installer where you want to 1) put the executable installer (a temporary directory where you have lots of room is a good choice), and 2) the "installation" directory where you will typically leave the distribution after the installer unpacks it.
We use the commercial InstallAnywhere(TM) program to create the installers. It provides installers for:
The default installer will put the installer executable in a fixed directory and the installed MAExplorer files in another fixed directory.
Note that the installers (where possible) will include a copy of a recent Java Virtual Machine (JVM) from InstallAnywhere(TM) to make running MAExplorer on your computer more robust. This is used locally and only affects the running of MAExplorer. It will not affect any other Java applications on your computer. In the case of Mac OS, if you have an older version of the MRJ JVM, it will ask you if you want to upgrade to the newer version (MRJ-2.4.5) - however you do not have to unless you want to.
The MAExplorer Reference Manual describes the details of MAExplorer as well as showing a number of screens illustrating various data-mining operations. Several tutorials are available and are discussed in the Reference Manual.
.mae startup file | Data set contents |
---|---|
Lact-C57vsStat5a-5probes.mae | 5 probes. (X,Y) is lactation day 1 (C57B6, Stat5a(-,-)) |
Lact-C57vsStat5aCEBPnull-19probes.mae | 19 probes. (X,Y) subset is lactation day 1 (C57B6, Stat5a(-,-) + CEBP-null) |
Lact1-C57vsStat5a-38probes.mae | 38 probes. (X,Y) subset is lactation day 1 (C57B6, Stat5a(-,-)) |
Lact1vs10-38probes.mae | 38 probes. (X,Y) subset is C57B6 lactation day (1,10) |
MAEstartupDefault.mae | No initial samples loaded |
Preg-C57vsStat5a-4probes.mae | 4 samples. (X,Y) is pregnancy (C57B6, Stat5a(-,-)) |
Preg-C57vsStat5a-8probes.mae | 8 samples. (X,Y) is pregnancy (C57B6, Stat5a(-,-)) |
Preg13VsLact1-38probes.mae | 38 samples. (X,Y) subset is pregnancy (C57B6, Stat5a(-,-)) |
Preg13day-C57vsStat5a-19probes-cache.mae | 19 samples from MGAP Web server. (X,Y) subset is pregnancy (C57B6, Stat5a(-,-)) |
Preg13day-C57vsStat5a-19probes.mae | 19 samples. (X,Y) subset is pregnancy (C57B6, Stat5a(-,-)) |
Preg13day-C57vsStat5a-38probes.mae | 38 samples. (X,Y) subset is pregnancy (C57B6, Stat5a(-,-)) |
Preg13day-Stat5aVsCEBP-null-38probes.mae | 19 samples. (X,Y) subset is pregnancy (Stat5a(-,-),CEBP-null) |
reuseXY-Preg-C57vsStat5a-8probes.mae | Same as other startup, but uses XY coordinates of 1st sample |
reuseXY-Preg13day-C57vsStat5a-38probes.mae | Same as other startup, but uses XY coordinates of 1st sample |
C57vsDevModels-15probes-cache.mae | 15 samples from MGAP cache. (X,Y) subset is (C57B6, knock-outs) |
C57vsDevModels-15probes.mae | 15 samples. (X,Y) subset is (C57B6, knock-outs) |
C57vsDevModels-38probes.mae | 38 samples. (X,Y) subset is (C57B6, knock-outs) |
MGAP-50samples.mae | 50 samples. All of the public samples sorted alphabetically |
If you are on a Macintosh system, then start MAExplorer and then run the startup .mae file you want by going to the File menu and then the Databases submenu. Use the "Open disk DB" option to browse your disk and then open up the startup file of interest.
If you are on a Unix system, then you supply the MAE file explicitly in the command line. You might consider adding the "installation" directory to your UNIX $PATH or $path variable to have UNIX automatically find the executable binary.
cd installation-directory/ MAExplorer.bin MAE/Preg13VsLact1-38probes.mae
limit stacksize unlimitedIn addition, we have set the default stack size that MAExplorer uses to 256Mbytes. If your computer has less physical memory, it will page. You may also increase this number as well if you have more memory and want to use it. The solution is to edit the MAExplorer.lax file found where you installed MAExplorer. Change the two instances of memory allocation from 256000000 to a smaller number that is less than your actual memory size.
http://sunsolve.Sun.COM/pub-cgi/show.pl?target=patches/patch-accessand choose the appropriate patch set for the version of Solaris (2.6, 7, or 8) that you are running. Do not choose any of the x86 versions unless you are running Solaris x86. Click on either the Download HTTP option or Download FTP option, and click the GO button to download the patch set.
A: For Mac-X, with 256 character file names, this is not a problem. For MacOS 8 and 9 with 32 character file names it may be a problem. Because MAExplorer uses file extensions (eg. ".quant"), you are currently limited to 25 characters or less. We will be modifying the system to remove this limit.
Q: I tried unsuccessfully to open NCI/CIT mAdb data (nciarray.nih.gov) on a Mac OS system. I generated a .zip file using mAdb "BETA Formatted Array Data Retrieval Tool" , then decompressed this .zip file using "Stuffit Expander" on my Mac. The Start.mae file could not be opened by MAExplorer, what can I do to fix this?
A: Stuffit Expander (default settings) removes a form feed character from decompressed text files, this prevents the Start.mae (and other text files used by MAExplorer) to be read by MAExplorer. To fix this you need to set Stuffit Expander so that it will keep the form feed characters when it decompress text files:
Open Stuffit Expander by double clicking its icon Click on menu File -> Preferences Click on "Cross Platform" Click on "Never" button of 'Convert text file to Macintosh format:'Your .zip will be decompressed properly and the text files from your mAdb data can now be open by MAExplorer.
Q: How do I start MAExplorer on my data automatically by double-clicking a Start.mae file on my Mac.
A: There is no easy way to do this at this time. Use the File menu, Databases, Open Disk DB browser to specify the Start.mae file.
% MAExplorer Stack size of 97664 Kb exceeds current limit of 8192 Kb. (Stack sizes are rounded up to a multiple of the system page size.) See limit(1) to increase the stack size limit.If the Sun (under Solaris) is slow in loading MAExplorer or has memory errors (shown above) one should first see what the memory limits are set to on your machine using the "limit" command. If they are too small they should be increased or set to "unlimited" (see in 2.4 above
# LAX.NL.JAVA.OPTION.JAVA.HEAP.SIZE.MAX # ------------------------------------- lax.nl.java.option.java.heap.size.max=256000000 # LAX.NL.JAVA.OPTION.NATIVE.STACK.SIZE.MAX # ---------------------------------------- lax.nl.java.option.native.stack.size.max=256000000
MAExplorer has a Java plugin extension facility. Plugins written for MAExplorer are called "MAEPlugins". These MAEPlugins allow investigators to extend the core capabilities of MAExplorer program themselves by writing special programs to implement new analysis methods and access data from their MAExplorer database(s). The design of this plugin extension enables users to write these new methods and have them added to the MAExplorer menus or for plugins to be invoked when MAExplorer starts up. In addition, default MAExplorer functionality could be changed by replacing existing MAExplorer methods with user defined methods. Writing a plugin to extend functionality using our Open Java API (Application Programming Interface) than to understand and modify the full MAExplorer program. This section of the Web site describes the API, describes how to write a MAEPlugin, and gives examples of various plugins. All source code is available on our CVS Repository.
The Open Java API is available as the set of MJAxxxx classes in the
MAExplorer.jar file.
MAExplorer is open source with a Mozilla 1.1 general public license. However, we have made the MAEPlugins public domain (a secondary license that is even less restrictive) with no restrictions on their use. This enables the research community to modify and help improve MAExplorer and the MAEPlugins as required. We are dividing the plugins into those that are donated and those that require interaction with the supplier. We hope that most plugin developers will make them available as open source, but that is not a requirement. If you are interested in writing a plugin or working with us on this open source project please contact us.
|
As we noted, the Open Java API is included in the regular .jar file distributed when you download MAExplorer. The current MAExplorer jar file may be downloaded from MAExplorer.jar. You also will also automatically download the jar file when installing MAExplorer. If you have MAExplorer installed, then you can use the (File menu | Update MAExplorer from maexplorer.sourceforge.net) command when running MAExplorer to get the latest MAExplorer.jar file release.
This document discusses the paradigm how MAEPlugins are used with MAExplorer and the design used to give them access to MAExplorer data. The first part discuss the top level design and the second part gives an example of using a plugin. The details on the internals for MAExplorer itself are described in a Design doc (PDF) or (PPT). However, an understanding of the MAExplorer internals is not required to write a MAEPlugin.
Figure 2. Open Java API for MAEPlugins. Each type of application could be derived from specialized Java classes that contain most of the access methods required for that type of analysis. The Gather - Scatter API is a means of "gathering" data from MAExplorer internal data structures for the plugin. When a plugin wants to store data back into MAExplorer, it is "scattered" back into the internal data structures. This is implemented using the MaeJavaAPI and MJAxxx classes described in the Open Java API.
Figure 3. Loading a MAEPlugin from your file system using the Load Plugins command in the Plugins pull down menu. If you have a plugin .jar or .class file, it may be specified using the "Load plugin" command. This pops up a file browser to let you specify the plugin file.
Figure 4. Executing the new command previously loaded in the Plugin menu. Selecting the new "Show List Active Filters" command that now appears in the Plugins menu invokes the plugin. This pops up a report shown in the next figure.
Figure D.5. Popup window from executing the MAEPlugin. This plugin gives a full report on the data Filter status in a new pop up window.
This document describes the MAEPlugins Open Java API (Applications Programming Interface) to enable researchers to write their own MAEPlugins for use with MAExplorer. The Open Java API (or API) is presented here as two javadoc trees.
The Open Java API is automatically included in the MAExplorer.jar file. Although it wastes some space, we are exporting the symbol tables with the files in MAExplorer.jar so that you could use it with a debugger (such as Forte for Java Community Edition) to develop a MAEPlugin. Forte 4.0 has been renamed "Sun One". We have prepared a document Configuring SourceForge's CVS to work with Forte on Windows for MAExplorer that describes how to set up a software development environment.
|
MJAxxxx Class Objects and method access ------------- ------------------------- MJAbase base class and constants used by other MJA classes MJAcluster cluster data structures and methodst MJAcondition condition lists of samples and ordered lists of condition lists MJAeval command interpreter to invoke MAExplorer commands MJAexprProfile expression profiles data MJAfilter data filters MJAgene single gene data MJAgeneList lists of genes and get sets MJAgenomicDB genomic databases on the Internet MJAgeometry array geometry, spot to gene maps, etc. MJAhelp popup browser help methods MJAhistogram histogram plots MJAmath built-in math functions MJAnormalziation normalization data and methods MJAplot scrollable 2D plot support [Future] MJAproperty get and put individual properties MJApropList get lists of properties MJAsample get and put single sample top-level data MJAsampleList get lists of samples top-level data MJAscrollablePlot scrollable 2D plot support [Future] MJAsort built-in sort methods MJAstatistics built-in statistics methods MJAstate get and save state, get additional state info MJAutil built-in utility methods
This document briefly describes how to write MAEPlugins using the MJA Open Java API (Application Programming Interface). It discusses key issues to be addressed when writing a MAEPlugin and describes in sufficient detail to enable researchers to write their own MAEPlugins for use with MAExplorer. Note that there are basically two types of plugins: those which are one-shot plugins (e.g., popup a window with its own user interface or perform an operation one time), and pipeline operations. The latter include FilterPlugins and NormalizationPlugins. These are inserted by MAExplorer into the gene filtering chained intersection analysis and the normalization analysis. See examples of existing plugins to help understand the differences.
This document gives a simple tutorial example of MAEPlugins source code. After you have read this you might look at some of the source code from actual plugins. Note that there are several base class plugins (PopupPlugin, FilterPlugin, NormalizationPlugin, etc.) that require different overide methods or have abstract methods you must implement. Look at the examples to clarify this.
import MAEPlugin.popup.PopupPlugin; import MAEPlugin.*;If you are writting other types of plugins, you need to import those instead (eg. MAEPlugin.analysis.NormalizationPlugin, MAEPlugin.analysis.FilterPlugin, etc).
The XxxxxPlugin() method is called at the time the plugin is loaded. Any particular actions that may be required can be performed at that time. In this example, we merely set the name of the plugin as it is to appear in the Plugins pull-down menu.
The pluginMain() method is called at the time the plugin is invoked by selecting the menu entry.
The four special event handling methods updateCurGene(), updateFilter(), updateSlider(), and updateLabels() are invoked by the MAExplorer PopupRegistry when any of these events occurs. If you are doing nothing with the events, they may be no-ops. However, if you want to take action on these events, you would normally implement the actual event handling code in your Xxxxx.java class.
/** File: ExamplePlugin.java */ import MAEPlugin.popup.PopupPlugin; import MAEPlugin.*; /** * This class invokes the ExamplePlugin plugin. */ public class ExamplePlugin extends PopupPlugin implements MAEUpdateListener { /** The current instance of a plugin called "Example". * The instance may be non-null if run previously and is needed to kill * a previous instance when new instances are created. */ private Example eObj= null; /** * ExamplePlugin() - this is the constructor end-users must implement * to use the API. It is called at the time the plugin is loaded. */ public ExamplePlugin() throws PluginException { /* ExamplePlugin */ /* Note: "Example plugin" is a string that appears in the * Plugin menu. */ setMenuLabel("Example plugin"); MJApopupRegistry pr= MAExplorer.mja.mjaPopupRegistry; int propBits= (pr.PRPROP_CUR_GENE | pr.PRPROP_FILTER | pr.PRPROP_LABEL | pr.PRPROP_SLIDER | pr.PRPROP_UNIQUE); pr.addUniquePopupWindowToReg(this, "ShowListActiveFilters", propBits); } /* ExamplePlugin */ /** pluginMain() - the method end-users must implement to use the API. * It is invoked when the user selects the plugin in a menu. */ public void pluginMain() { /* pluginMain */ MaeJavaAPI mja= MAExplorer.mja; /* Open Java API library access */ if(eObj==null) eObj= new Example(mja); else { /* re-rerun Example on new data */ eObj.dispose(); eObj= null; System.gc(); mja.mjaUtil.maeRepaint(); eObj= new Example(mja); } } /* pluginMain */ /** updateCurGene() - update any data since current gene has changed. * This is invoked by the MAExplorer PopupRegistry. * @param mid is the MID (Master Gene ID) that is the new current gene. */ public void updateCurGene(int mid) { if(eObj!=null) eObj.updateCurGene(mid; } /** updateFilter() - update any dependent data since the data Filter * has changed. This is invoked by the MAExplorer PopupRegistry. */ public void updateFilter() { if(eObj!=null) eObj.updateFilter(); } /** updateSlider() - update any dependent data since a threshold slider * has changed. This is invoked by the MAExplorer PopupRegistry. */ public void updateSlider() { if(eObj!=null) eObj.updateSlider(); } /** updateLabels() - update any dependent data since global labels * have changed. This is invoked by the MAExplorer PopupRegistry. */ public void updateLabels() { if(eObj!=null) eObj.updateLabels(); } /** * close() - close the plugin. This will be called if you * had specified the plugin as PRPROP_UNIQUE since previous * instances will be closed before the new instance is started. * @param preserveDataStructuresFlag to save data structures */ public void close(boolean preserveDataStructuresFlag) { if(eObj!=null) eObj.close(); } } /* end of class ExamplePlugin*/
/** File: Example.java */ public class ListActiveFilters extends Frame implements ActionListener, WindowListener, etc. { /** Example() - Constructor */ public Example(MaeJavaAPI mja) { /* Example */ /* [1] Access Open Java API required through MaeJavaAPI instances * of these MJA classes. */ MJAfilter mjaFilter= mja.mjaFilter; /* Open Java API library */ MJAgeneList mjaGeneList= mja.mjaGeneList; /* Open Java API library */ MJAproperty mjaProperty= mja.mjaProperty; /* Open Java API library */ MJAsampleList mjaSampleList= mja.mjaSampleList; /* Open Java API library */ /* [2] Get the data */ String sR= "Example of some data accessed from MAExplorer\n", maePrjPath= mjaProperty.getMaeCurProjectPath(), maeBrowserTitle= mjaProperty.getMaeBrowserTitle(), maeDatabase= mjaProperty.getMaeDatabaseTitle(), maeDbSubset= mjaProperty.getMaeDbSubsetTitle(); String sActive[]= mjaFilter.getListFilterNames(); int nActive= sActive.length; sR += " LIST OF ACTIVE FILTERS\n"; for(int i=0;i<nActive;i++) if(sActive[i]!=null) sR += " " + sActive[i] + "\n"; int nSamples= mjaSampleList.getNbrHPsamples(); String sampleNames[]= mjaSampleList.getHP_Elist_SampleNames(); sR += " LIST OF SAMPLES\n"; for(int i=0;i<nSamples;i++) sR += sampleName[i] +"\n"; int filteredMIDlist[]= mjaGeneList.getMIDindicesForFilterGeneList(), nFilteredGenes= filteredMIDlist.length; String filteredGeneNames[]= mjaGeneList.getGeneFieldDataFromGeneList("workingCL", "GeneName"); sR += " LIST OF FILTERED GENES\n"; for(int i=0;i<nSamples;i++) sR += "Gene ["+filteredMIDlist[i]+"] = "+filteredGeneNames[i]+"\n"; System.out.println(sR); /* print to java console */ } /* Example */ /* In this example, no actions are taken on popup registry events. * However, the methods must exist in the code. */ public void updateCurGene(int mid) { } public void updateFilter() { } public void updateSlider() { } public void updateLabels() { } public void close() {this.destroy(); } } /* end of class Example.java */
This document lists all of the MAEPlugins alphabetically, by analysis method, and also links to MAEPlugins available on other Web sites. The MAEPlugins include those donated to the MAExplorer Open source Web site. All plugins distributed from this Web site will have the Java source code, JAR file and documentation. Some of these MAEPlugins were incorporated into MAExplorer after they were written because of their key functionality. However, we are leaving them on the Web site to serve as examples.
If you want to use the jar file plugins directly: (1) install MAExplorer from the list of Jar files on this Web site, (2) get the jar file(s) from the plugins below and save them in the Plugins/ directory where you installed MAExplorer, (3) run MAExplorer and use the (Plugins | Load Plugin) menu command to load the plugin. After it is loaded, just use it as you would any other menu command. The Plugins-jar.tar file is available with all of the MAEPlugin jar files. Simply unpack the directory using Unix tar or a Windows unzip program into a directory you can access when running MAExplorer. To let MAExplorer go directly to these files when you do a (Plugins | Load plugins) menu command, copy the .jar files into the Plugins/ directory where you previously installed MAExplorer. We also periodically update the MAEPlugins-....-src.tar.gz file in the Files download area. Files from the following list of MAEPlugins are archived as follows: source files are from the CVS archive, jar files are from the Web server archive of plugin .jar files, documentation is also from the CVS archive.
If you want to use these plugins as a basis for developing your own plugins, see developing a plugin and other resources available on this Web site. The source code for each plugin is available beow. We encourage, but don't require, plugin writers to donate their new plugin analytic methods to the MAExplorer Open Source Web site for others to use.
Cvt2Mae is a Java program designed to make it easier for use by researchers to use MAExplorer by helping them convert their data into the MAExplorer format. Cvt2Mae handles commercial chips such as Affymetrix, as well as other standard formats such as GenePix and Scanalyze or one-of-a-kind custom academic chips (<User-defined>). In addition, you may specify the fields of interest for the "Print file" or (GIPO or Gene-In-Plate-Order) file, and the fields containing the quantified data.
The Cvt2Mae converts specific chip information you entered into what we call an "Array Layout". This Array Layout file may be edited and saved for use in future conversions and shared with collaborators. Essentially, the Array Layout contains a set of "rules" for converting the user's data. After you have filled out the forms in Cvt2Mae, it will generate the set of converted data files and directories to be used directly with MAExplorer.
There are several slide shows describing how to use the Cvt2Mae to convert various data sets. They consist of a series of screen shots from Cvt2Mae that go through each of the steps on how to set up the parameters and convert your data. There are two for Affymetrix data, one is a downloadable PDF and other an extensive online version.
Instructions on downloading and installing Cvt2Mae.
If you then are still having problems email the help desk. Please include:
The Edit Layout wizard also has its own information area that is used for reporting. When you hold the mouse over the a field on the left side of the wizard window, information about that parameter will appear in the lower message area.
OPT_GRID_SIZE = 1200; /* Optimal grid size for MAExplorer viewing */ ROWS_TO_COLS_ASPECT_RATIO = 3.0/4.0; /* desired rows/cols aspect aspect for a grid */ extra = 0; /* # of extra grid cols required */ /* Estimate # of grids. Assume a square aspect ratio */ if(n <= OPT_GRID_SIZE) nGrids = 1; else nGrids = (n / OPT_GRID_SIZE)+1; /* Estimate rows (r) and columns (c) from a rectangular grid * where cols = (4/3) rows. * Then, c = (4/3)r and r*c= area. * Then (4/3)*r*r = area or * r = sqrt((3/4)*area). */ if(nRowsExpected > 0) while(true) { /* iterate to optimal size */ gridSize = n/nGrids; nGridRows = sqrt( ROWS_TO_COLS_ASPECT_RATIO * gridSize ); nGridCols = (nGridRows / ROWS_TO_COLS_ASPECT_RATIO); nGridCols += extra; estTotSize = (nGrids * nGridRows * nGridCols); if(estTotSize > nRowsExpected) break; else extra++; /* keep trying until meet criteria */ } /* iterate to optimal size */
Figure 1. shows the Affymetrix tab-delimited data in Excel. (after missing fields have been edited as described above).
Figure 2. Initial state of the Cvt2Mae Program. The user must select an array layout or define one in order to analyze the input data file or files.
Figure 3. Selecting a Chipset Array Layout. The built-in array layouts are shown for the Incyte and Affymetrix. User-defined layouts would be added by selecting the <User-defined> layout.
Figure 4. Select one or more user input data files by pressing the "Browse input file name" button and then pick a file. If the layout indicates that it may contain more than one hybridization, it will attempt to find the data. You can subsequently rename individual samples which may be necessary if you are reading several files with the same sub-sample names. After the file browser pops up, select a user input data file. If you are using a file that contains all of your samples, then you only need to specify one file. If you have several files, then repeat this step until you have added all of the files you want.
Figure 5. Files selected by user and samples "discovered" in the data file. Each input file is analyzed to determin if it has] multiple samples and if so they are added to the list of input files at below step 2.1 in the window. You may remove any samples which may be necessary for bad data. You may rename any sample which may be necessary if you have the same sample name occuring in several different data files (they are actually different samples).
Figure 6. Edit Layout Wizard for name of the Array Layout. A) is the original array layout frome the database. B) Since we may want to edit it, we will rename the vendor and Array layout name. This will enable us to save the changed layout if we wish. You may not overide system defined layouts, but you may overide your own layouts or save a system layout under a new name (as is shown here).
Figure 7. Edit Layout Wizard for Grid Geometry.
Figure 8. Edit Layout Wizard for Starting Data Rows.
Figure 9. Edit Layout Wizard for Ratio or Intensity data.
Figure 10. Edit Layout Wizard for optional (X,Y) spot coordinates available in the input data.
Figure 11. Edit Layout Wizard for optional Genomic ID values available in the input data.
Figure 12. Edit Layout Wizard for optional Gene Names available in the data.
Figure 13. Edit Layout Wizard for optional calibration DNA available in the data and UniGene species prefix.
Figure 14. Edit Layout Wizard for optional user names for Project, Database, Subdatabase, etc.
Figure 15. Edit Layout Wizard for optional HP-X and HP-Y 'set' experimental class (i.e. condition) names.
Figure 16. Edit Layout Wizard for changing the default data filter threshold slider values.
Figure 17. Edit Layout Wizard for Assign GIPO fields. These Gene-In-Plate-Order data field mappings should only be changed if required for additional data fields you may have added to your input file. All fields should be defined. (it is required for <User-defined> data). In general, it may be ok to have some non-critical genomic ID fields undefined.
Figure 18. Edit Layout Wizard for Assign Quant fields. These Quantification data field mappings should only be changed if required to define all fields (it is required for <User-defined> data).
Figure 19. Saving modified Array Layout if you have made changes. This is useful if you have changed the array layout with "Edit Layout", "Assign GIPO fields", or "Assign Quant fields" so that you can use it another time.
Figure 20. Selecting the output folder in which to save the converted files. The Magenta "Save Layout" button means that you may save the edited array layout if you wish. You now need to create an output folder to put the converted data. You may create a New Folder, use an Existing Folder or use the Same Folder that contained the input files. We selected the "New Folder" option.
Figure 21. Browse to select the output folder in which to save the converted files. You may create a new folder here. Select the "name" of the folder - don't go into the folder.
Figure 22. shows the interface after selection of the output file folder using a file browser. Notice that the current project directory is now displayed in the interface as well as the location of the MAExplorer Start.mae file that will be generated. The data will be created when the Run button is pressed.
Figure 23. shows the conversion being performed after the user pressed the RUN button. This process takes a minute or so depending on the speed of the computer and the complexity of the data.
Figure 24. shows the conversion summary instructions after the conversion is finished. At this point press the DONE button to exit the converter.
Figure 25. shows the files that are generated by Cvt2Mae for use by MAExplorer. The generated data consists of several directories that are described in the Reference Manual Appendix C.
Figure 26. Starting MAExplorer on the converted data by clicking on Start.mae file. Alternatively, Note that the location of the "MAExplorer startup file:" is specified. Go to that file and click on it to start MAExplorer. Alternatively, start MAExplorer and do "File | Open Disk DB" and open that file to start it.
Table 1. below lists the various types of downloads: program installers, source code files, jar files, and information on installing the programs. The Java API documentation is also available. Table 2. lists various ways to download the Mammary Genome Anatomy Program (MGAP) public data set that can be used with MAExplorer.
Click on the entries to download the files.
Program | Installer Version |
Update Program Version |
Program installers |
Information on installing |
Source | Jar file(s) |
---|---|---|---|---|---|---|
MAExplorer | 0.96.34.01 | 0.96.34.01 | MAExplorer | installing MAExplorer | source code | MAExplorer.jar |
MAEPlugins | - | - | (not required) | Using MAEPlugins | source code | List of MAEPlugins |
Cvt2Mae | 0.73 | 0.73 | Cvt2Mae | installing Cvt2Mae | source code | Cvt2Mae.jar |
Download method | Web address |
---|---|
A single gzip file from SourceForge | SourceForge.net: MGAP-Array-database.tar.gz |
As separate files | http://www.lecb.ncifcrf.gov/mae/MGAP-Array-database/ |
A single zip file | http://www.lecb.ncifcrf.gov/mae/MGAP-Array-database.zip |
A single tar file | http://www.lecb.ncifcrf.gov/mae/MGAP-Array-database.tar |
Similarly, in the Cvt2Mae program, pressing the "Update Cvt2Mae" button will repeat the same process except that it does it for the Cvt2Mae.jar file and creates a backup file called Cvt2Mae.jar.bkup.
However, you can generate your own javadocs for the code using the Unix script CreateMAExplorerJavaDocs.do for MAExplorer and CreateCvt2MaeJavaDoc.do .
View | Javadoc folder |
---|---|
Full javadocs (public+private) for MAExplorer | docsFull |
Full javadocs (public only) for MAExplorer | docsAllPublic |
Open Java API javadocs for MAExplorer | docsOJAPI |
MaeJavaAPI (MJA) javadocs for MAExplorer | docsMJA |
Full (public+private) javadocs for Cvt2Mae | javadocs |
![]() |
|
Mozilla Public License 1.1 (MPL 1.1)1. Definitions.
1.1. ''Contributor'' means each entity that creates or contributes to the creation of Modifications. 1.2. ''Contributor Version'' means the combination of the Original Code, prior Modifications used by a Contributor, and the Modifications made by that particular Contributor. 1.3. ''Covered Code'' means the Original Code or Modifications or the combination of the Original Code and Modifications, in each case including portions thereof. 1.4. ''Electronic Distribution Mechanism'' means a mechanism generally accepted in the software development community for the electronic transfer of data. 1.5. ''Executable'' means Covered Code in any form other than Source Code. 1.6. ''Initial Developer'' means the individual or entity identified as the Initial Developer in the Source Code notice required by Exhibit A. 1.7. ''Larger Work'' means a work which combines Covered Code or portions thereof with code not governed by the terms of this License. 1.8. ''License'' means this document. 1.8.1. "Licensable" means having the right to grant, to the maximum extent possible, whether at the time of the initial grant or subsequently acquired, any and all of the rights conveyed herein. 1.9. ''Modifications'' means any addition to or deletion from the substance or structure of either the Original Code or any previous Modifications. When Covered Code is released as a series of files, a Modification is:
B. Any new file that contains any part of the Original Code or
previous Modifications.
1.10.1. "Patent Claims" means any patent claim(s), now owned or hereafter acquired, including without limitation, method, process, and apparatus claims, in any patent Licensable by grantor. 1.11. ''Source Code'' means the preferred form of the Covered Code for making modifications to it, including all modules it contains, plus any associated interface definition files, scripts used to control compilation and installation of an Executable, or source code differential comparisons against either the Original Code or another well known, available Covered Code of the Contributor's choice. The Source Code can be in a compressed or archival form, provided the appropriate decompression or de-archiving software is widely available for no charge. 1.12. "You'' (or "Your") means an individual or a legal entity exercising rights under, and complying with all of the terms of, this License or a future version of this License issued under Section 6.1. For legal entities, "You'' includes any entity which controls, is controlled by, or is under common control with You. For purposes of this definition, "control'' means (a) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (b) ownership of more than fifty percent (50%) of the outstanding shares or beneficial ownership of such entity.
The Initial Developer hereby grants You a world-wide, royalty-free, non-exclusive license, subject to third party intellectual property claims:
(b) under Patents Claims infringed by the making, using or selling of Original Code, to make, have made, use, practice, sell, and offer for sale, and/or otherwise dispose of the Original Code (or portions thereof). (d) Notwithstanding Section 2.1(b) above, no patent license is
granted: 1) for code that You delete from the Original Code; 2) separate
from the Original Code; or 3) for infringements caused by: i) the
modification of the Original Code or ii) the combination of the Original
Code with other software or devices.
Subject to third party intellectual property claims, each Contributor hereby grants You a world-wide, royalty-free, non-exclusive license (a) under intellectual property rights (other than patent or trademark) Licensable by Contributor, to use, reproduce, modify, display, perform, sublicense and distribute the Modifications created by such Contributor (or portions thereof) either on an unmodified basis, with other Modifications, as Covered Code and/or as part of a Larger Work; and (b) under Patent Claims infringed by the making, using, or selling of Modifications made by that Contributor either alone and/or in combination with its Contributor Version (or portions of such combination), to make, use, sell, offer for sale, have made, and/or otherwise dispose of: 1) Modifications made by that Contributor (or portions thereof); and 2) the combination of Modifications made by that Contributor with its Contributor Version (or portions of such combination). (c) the licenses granted in Sections 2.2(a) and 2.2(b) are effective on the date Contributor first makes Commercial Use of the Covered Code. (d) Notwithstanding Section 2.2(b) above, no patent license is granted: 1) for any code that Contributor has deleted from the Contributor Version; 2) separate from the Contributor Version; 3) for infringements caused by: i) third party modifications of Contributor Version or ii) the combination of Modifications made by that Contributor with other software (except as part of the Contributor Version) or other devices; or 4) under Patent Claims infringed by Covered Code in the absence of Modifications made by that Contributor.
The Modifications which You create or to which You contribute are governed by the terms of this License, including without limitation Section 2.2. The Source Code version of Covered Code may be distributed only under the terms of this License or a future version of this License released under Section 6.1, and You must include a copy of this License with every copy of the Source Code You distribute. You may not offer or impose any terms on any Source Code version that alters or restricts the applicable version of this License or the recipients' rights hereunder. However, You may include an additional document offering the additional rights described in Section 3.5. 3.2. Availability of Source Code.
3.3. Description of Modifications.
3.4. Intellectual Property Matters
If Contributor has knowledge that a license under a third party's intellectual property rights is required to exercise the rights granted by such Contributor under Sections 2.1 or 2.2, Contributor must include a text file with the Source Code distribution titled "LEGAL'' which describes the claim and the party making the claim in sufficient detail that a recipient will know whom to contact. If Contributor obtains such knowledge after the Modification is made available as described in Section 3.2, Contributor shall promptly modify the LEGAL file in all copies Contributor makes available thereafter and shall take other steps (such as notifying appropriate mailing lists or newsgroups) reasonably calculated to inform those who received the Covered Code that new knowledge has been obtained. (b) Contributor APIs.
3.6. Distribution of Executable Versions.
3.7. Larger Works.
Netscape Communications Corporation (''Netscape'') may publish revised and/or new versions of the License from time to time. Each version will be given a distinguishing version number. 6.2. Effect of New Versions.
6.3. Derivative Works.
8.2. If You initiate litigation by asserting a patent infringement claim (excluding declatory judgment actions) against Initial Developer or a Contributor (the Initial Developer or Contributor against whom You file such action is referred to as "Participant") alleging that: (a) such Participant's Contributor Version directly or indirectly infringes any patent, then any and all rights granted by such Participant to You under Sections 2.1 and/or 2.2 of this License shall, upon 60 days notice from Participant terminate prospectively, unless if within 60 days after receipt of notice You either: (i) agree in writing to pay Participant a mutually agreeable reasonable royalty for Your past and future use of Modifications made by such Participant, or (ii) withdraw Your litigation claim with respect to the Contributor Version against such Participant. If within 60 days of notice, a reasonable royalty and payment arrangement are not mutually agreed upon in writing by the parties or the litigation claim is not withdrawn, the rights granted by Participant to You under Sections 2.1 and/or 2.2 automatically terminate at the expiration of the 60 day notice period specified above. (b) any software, hardware, or device, other than such Participant's Contributor Version, directly or indirectly infringes any patent, then any rights granted to You by such Participant under Sections 2.1(b) and 2.2(b) are revoked effective as of the date You first made, used, sold, distributed, or had made, Modifications made by that Participant. 8.3. If You assert a patent infringement claim against Participant alleging that such Participant's Contributor Version directly or indirectly infringes any patent where such claim is resolved (such as by license or settlement) prior to the initiation of patent infringement litigation, then the reasonable value of the licenses granted by such Participant under Sections 2.1 or 2.2 shall be taken into account in determining the amount or value of any payment or license. 8.4. In the event of termination under Sections 8.1 or 8.2 above, all end user license agreements (excluding distributors and resellers) which have been validly granted by You or any distributor hereunder prior to termination shall survive termination.
http://www.mozilla.org/MPL/ Software distributed under the License is distributed on an "AS IS"
basis, WITHOUT WARRANTY OF
The Original Code is ______________________________________. The Initial Developer of the Original Code is ________________________.
Portions created by
Contributor(s): ______________________________________. Alternatively, the contents of this file may be used under the terms of the _____ license (the “[___] License”), in which case the provisions of [______] License are applicable instead of those above. If you wish to allow use of your version of this file only under the terms of the [____] License and not to allow others to use your version of this file under the MPL, indicate your decision by deleting the provisions above and replace them with the notice and other provisions required by the [___] License. If you do not delete the provisions above, a recipient may use your version of this file under either the MPL or the [___] License." [NOTE: The text of this Exhibit A may differ slightly from the text of the notices in the Source Code files of the Original Code. You should use the text of this Exhibit A rather than the text found in the Original Code Source Code for Your Modifications.] |
The U.S. Government LEGAL notice accompanies the MPL 1.1 document. |
This document comprises the LEGAL File pursuant to Articles 3.4 and 4 of the Mozilla Public License (version 1.1) stating the intellectual property and other limitations associated with the use of MAExplorer under this License. Dr. Peter Lemkin as an employee of The National Cancer Institute (NCI), an agency of the United States Government, is the Initial Developer of MAExplorer (the Original Code). As such, the following limitations apply to this License:
Object Graphic Similarity A ******* 1.00 B ***** 0.83 C ***** 0.80 D ** 0.30 E * 0.17 F 0.06
Zdiff(x,y,c) = Zscore(x,c) - Zscore(y,c)
Zscore(p,c) = (c - meanp)/stdDevp