One can also create, edit and save their own custom Array Layouts using the <user-define> Array Layout. If you define other custom arrays, they can be saved and will appear in this pulldown menu for next time. User defined Array layouts are saved as a text file in a subdirectory called "ArrayLayout" under the name used in the Edit Layout menu with the extension ".alo". You may also remove array layouts using the "Remove Layout" button. It will popup a file dialog from which you can choose an Array Layout to delete. Downloading and installing newer versions of Cvt2Mae will not wipe out your custom defined array layouts.
Depending on your data, there may be several variations on how the
data is configured. For example some array applications will save
multiple files for each sample, while others will have multiple
samples in one file. Sometimes the GIPO data is available as a
separate GIPO file. Once the files have been selected go to step 3
below.
Click on the "Browse input file name" button to popup a file selection
window. Then select the data file for conversion. The file name(s)
will then appear in a text area below the "Browse input file name"
button. Some arrays (e.g. Affymetrics) may have data for multiple
samples in one file.
You can convert multiple files. Repeatedly select files with the
"Browse input file name" button. Each time you select a file, it will
appear in a list below the "Browse input file name" button.
Some data will consist of multiple files such as a separate GIPO.
Step 2 Select input file(s)
2.1 Selecting a single data file
2.2 Selecting multiple files
2.3 Separate GIPO file
Sometimes array data will have a separate GIPO file. You must click on
the "Separate GIPO" check box to activate the "Browse GIPO file"
button. Then click on the "Browse GIPO file" button to choose the
separate GIPO file. Note: A detailed description of GIPO files can be
found in the MAExplorer Reference Manual under Appendix C.
Step 3 Edit Array Layout
The "Edit Layout", "Assign GIPO Fields", and "Assign Quant Fields"
popup wizard windows are where you specify parameters for your data
conversion. These parameters will define which fields are to be used,
what the array geometry is, field data row locations etc.
Edit Layout Button
This wizard allows you to enter the particular parameters through a
step by step process. Some parameters are optional and are listed as
"opt" in the top of the wizard window.
Some arrays have multiple or duplicate fields.This is the
number of duplicated spot Fields in array. EVERY spot is
duplicated. For example, each grid of spots is
duplicated. We refer to these in MAExplorer as F1 and F2.
If there are no duplicates, then there is 1 field.
Number of Grids per Field. A grid contains
Grid Rows X Grid Columns of spots
Use the Molecular Dynamics 'NAME-GRC' specification for
(grid, row, column), otherwise use separate fields for (grid,
grid_row, grid_col)
If you specify the array layout by Grid-geometry (ABOVE),
then enter (#Fields, #Grids, #Grid-rows,#Grid-cols). If
you specify the layout by the maximum number of spots in
the array (BELOW), it will estimate a pseudo-layout that
the spots will fit on the this array for visualization
purposes. It does not correspond to the actual array
layout which you do not have to enter.
This is the maximum number of spots that may occur in your data.
Number of the row containing the names of the multiple
samples if the file contains multiple samples. (Row #s
start at row 1.) If there are no sample names, set it to 0 or
leave blank. If you change it from 0 to any positive row
number, it removes input files and re-reads the first file
to get the proper data field names so you may use it with the
Assign GIPO Fields and Assign Quant Field assignment
operations.
Number of rows that contains the names of the data file Field names
Number of first row that contains quantitative array Data in
the file. It is assumed that this is followed by the
rest of the array data.
Number of first row that contains optional separate GIPO file array Data.
Leave it blank, if there are no comment lines.
If you specify this, it checks for it in each data row
Data for MAExplorer is either ratio data such as Cy3/Cy5, or
intensity data such as P33, etc.
Ratio data may be presented as either (Cy5/Cy3)or (Cy3/Cy5)
The dye to associate with quantified data intensity 1.
The dye to associate with quantified data intensity 2.
The input data file includes background intensity data that
you want to include. You do NOT have to include that data.
Generate a microarray pseudo image using a representation of
the array based on Grids, Grid Rows, and Grid Columns.
Otherwise, use the (X,Y) data supplied for each spot - if it
exists. If this option is set, it will overide the actual
(X,Y) coordinates if that option is selected as well.
The actual (X,Y) coordinate data exists for each spot. If
the data exists but you do NOT select this option, it will
use the pseudo-array option.
Reuse (X,Y) coordinates of first sample for all
samples. This is used if you want to 'Flicker' array pseudo
images between two samples.
Reverse rows and columns in the microarray pseudo image.
The user data file has Location identifier data. These could
be 'probe_set' for Affymetrix, 'Incyte ID' for Incyte,
etc. and are used as the gene identifier if there are no other
IDs.
The user data file has I.M.A.G.E 'Clone ID' data.
The user data file has 'GenBank' identifier data. See
http://ncbi.nlm.nih.gov/ for more information.
The user data file has 'UniGene' identifier data.
The user data file has 'dbEST' identifier data.
The user data file has 'LocusID' identifier data.
The user data file has 'SwissProt' identifier data.
See http://www.expasy.ch/ for more information.
The user data file has user Plate well identifier data. This
uniquely identifies the source of the spotted clone.
The Genomic IDs are encoded in the 'Description' field of
user input file (the Affymetrix encodeing of ids). It will
find and generate genomic IDs for: Clone_ID if /cl=XXXX is in
the Description, GenBank if /gb=XXXX is in the Description,
UniGene if /ug=XXXX is in the Description. If this switch is
enabled, then the explicit ID options are disabled.
The user data file has Gene Class ontology data for
each gene. [FUTURE]
The user data file has UniGene Name data. This could be used
if the default 'GeneName' description is not available.
The user data file has 'Quant' QualCheck data. This data is
on a per-spot basis for each array hybridization. The code
(see MAExplorer Reference Manual Appendix C Table C.4.2)
may be used to flag bad spots or missing spot data.
The user data file has GIPO QualCheck data. This data is on a
GIPO basis for the entire database. The code (see MAExplorer
Reference Manual Appendix C) may be used to flag bad gene
data.
This is the default name given to calibration DNA spotted on
the array for calibration purposes and indicated in the GIPO
file 'Clone ID' or 'GenBank ID' field.
This is the default name given in place of a I.M.A.G.E. Clone
ID when the researcher's clones have not yet been placed im
the I.M.A.G.E. respository and thus have no ID. It indicated
in the GIPO file 'Clone ID' or 'GenBank ID' field.
If you have empty or blank rows of spot data, type in the name
(i.e. 'empty') if any.
Species name (Mouse, Human, ...). It is used to document the
Array Layout
UniGene species prefix (Mouse Mm, Human Hs, etc.). This is
used in querying Genomic Web databases. If you do not see the
prefix you want in the choice menu, type it in.
The name you want to give to the created database name.
Your name for the database subset used in the initial .mae startup file.
Generic name of the project to be used for all samples in the
database. If no name is specified, it uses the input data
files folder.
Name of the program used to quantitate the spot data
from the sample images.
This is the name for the samples assign to the 'X set'.
This is the name for the samples assign to the 'Y set'.
Default cluster similarity threshold used in some of the
clustering methods. This is the initial value shown in
popup sliders.
The number of genes reported in gene Reports or in the
data Filter when this restriction is invoked.
Default # of clusters used in the K-means clustering method.
This is the initial value shown in popup sliders.
Default p-Value used in the t-Test data Filter.
This is the initial value shown in popup sliders.
Default Coefficient Of Variation used in the data Filters. This
is the initial value shown in popup sliders.
Default absolute difference threshold used in the data Filters.
This is the initial value shown in popup sliders.
3.2 "Assign GIPO Fields" Button
MAExplorer extracts data from the gene-in-plate-order (GIPO) gene
coordinate table. This links spots in a microarray to these Genomic
"gene ID"s and gene names. This table may also contain Clone ID,
GenBank, dbEST, UniGene IDs, LocusID corresponding to these Master
Gene IDs. An optional table of Clone IDs and Gene Classes the gene
belongs to may also be defined. The "Assign GIPO Fields Button" will
allows the user to associate or customize their fields. They may have
different names than what MAExplorer uses. A detailed description of
the MAExplorer GIPO file can be found in the MAExplorer manual under
Appendix C.
Often GIPO files supplied by array vendors have additional fields not currently used by MAExplorer. You can leave them in (they will be ignored) or take them out (loading a database is faster).
Step 4 Choose output folder/directory
This allows you to pick the location of where the converted data will
be stored. We recommend using the first one, "Create New project Folder".
Select Output Folder Options:
Step 5 Convert Data
Once you have setup all of the essential parameters you can now
convert your data by clicking on the green "Run" button. The status
lines will show what the converter is doing. Once it is done, the red
Abort button will change into a green "Done" button which you press to
exit the program. This means that your data has been successfully
converted and you can now go to the MAE folder to click on the
Startup.mae file to run MAExplorer on your converted data.
B. Generation of a pseudoarray geometry if no array geometry is specified
MAExplorer requires the data in the GIPO and Quant files be specified
by a spot position. This is indicated by the array spot geometry of
(#fields, #grids, #rows/grid, #columns/grid). The #fields is the
number of duplicated sets of grids if available - it is 1
otherwise. This 4-tuple must be specified in the Configuration file.
However, some array data does not have spot geometry position data
available. The alternative is to generate a pseudoarray geometry. This
is possible since the pseudoarray image in MAExplorer is used simply
to indicate success of the data filter or relative differences
depending on the "(Plot | Show Microarray)" menu option. The algorithm
presented below will generate a geometry
(nGrids,nGridRows,nGridCols) that is compatible with the
visual use of the pseudoarray. The only assumption is the
nRowsExpected, the number of spots in the microarray (rows in
the database input file). The number of spots in the array is computed
automatically and the option to use the pseudoarray instead of the
actual array geometry is selected in the
Edit Layout Wizard for Grid Geometry.
OPT_GRID_SIZE = 1200; /* Optimal grid size for MAExplorer viewing */ ROWS_TO_COLS_ASPECT_RATIO = 3.0/4.0; /* desired rows/cols aspect aspect for a grid */ extra = 0; /* # of extra grid cols required */ /* Estimate # of grids. Assume a square aspect ratio */ if(n <= OPT_GRID_SIZE) nGrids = 1; else nGrids = (n / OPT_GRID_SIZE)+1; /* Estimate rows (r) and columns (c) from a rectangular grid * where cols = (4/3) rows. * Then, c = (4/3)r and r*c= area. * Then (4/3)*r*r = area or * r = sqrt((3/4)*area). */ if(nRowsExpected > 0) while(true) { /* iterate to optimal size */ gridSize = n/nGrids; nGridRows = sqrt( ROWS_TO_COLS_ASPECT_RATIO * gridSize ); nGridCols = (nGridRows / ROWS_TO_COLS_ASPECT_RATIO); nGridCols += extra; estTotSize = (nGrids * nGridRows * nGridCols); if(estTotSize > nRowsExpected) break; else extra++; /* keep trying until meet criteria */ } /* iterate to optimal size */ |
grid grid col grid row RawIntensity Background 1 1 1 2226.8 32.6 1 1 2 1234.8 25.6 . . . 10 25 28 3333.8 23.6
grid grid col grid row RawIntensity1 Background1 RawIntensity2 Background2 1 1 1 2226.8 32.6 2345.9 39.4 1 1 2 1234.8 25.6 1245.9 39.4 . . . 10 25 28 3333.8 23.6 3345.9 25.4
field grid grid col grid row RawIntensity Background 1 1 1 1 2226.8 32.6 1 1 1 2 1234.8 25.6 . . . 1 10 25 28 3333.8 23.6 . . . 2 1 1 1 2226.8 39.4 2 1 1 2 1234.8 39.4 . . . 2 10 25 28 3333.8 25.4
NAME_GRC RawIntensity1 RawIntensity2 GRID- 1-R1C1 2126.500 3662.350 GRID- 1-R2C1 2311.430 3306.290 GRID- 1-R3C1 3696.470 5780.310 GRID- 1-R4C1 3167.450 5245.440 . . .
grid grid col grid row Cy3 Cy3Bkgd Cy5 Cy5Bkgd 1 1 1 2226.8 32.6 2345.9 39.4 1 1 2 1234.8 25.6 1245.9 39.4 . . . 10 25 28 3333.8 23.6 3345.9 25.4