2.3 Edit menu
The Edit menu operations include operations to modify the
'edited gene list' that is set from a variety of Filters as well as
manually this menu. The user may perform set operations (union,
intersection, and difference) on named sets of gene and sets of sample
experiments (conditions). User preferences are also set in this menu.
Sets of genes or HP condition lists are very useful for tracking
complex data-mining sequences of analysis. For example, derived named
gene sets may be used in successive data filters and for reports. For
example, one could do the following experiment given four different
types of HPs for (e.g. virgin, pregnancy, lactation, and
involution)
First compare two HPs using a statistical test such as a
t-test. Then save the resulting set of genes under the name "virgin
vs. pregnancy". Then compare the next two HPs and save the resulting
genes under the name "lactation vs. involution". Finally, compute
the difference of genes found in "virgin vs. pregnancy" that are not
found in "lactation vs. involution". This resulting gene set could
then be saved (e.g. with a name "Genes found in virgin
vs. pregnancy, but not in lactation vs. involution"). Similarly,
taking the intersection of these two named sets shows genes that are
common between the two sets. Taking the union shows genes found in
either of the two named sets.
|
The Edit menu contains the following main selections. All of these entities
and preferences are saved as part of the startup state when you
do a (File | Databases | SaveAs ... DB).
You may define and edit arbitrary sets of genes using the User
edited gene list submenu to modify the 'Edited Gene List'
(EGL). This has sub-modes of operation for adding or removing genes
from the image by clicking on spots. If the Show 'Edited Gene
List' mode is set, you may see exactly which genes you have
defined by the magenta squares drawn around each gene in the EGL. Many
of the clustering operations will leave the current cluster in the
EGL. The commands include:
-
Show 'Edited Gene List' [CB] - toggle showing the EGL as
magenta boxes in the pseudoarray image. If enabled, genes set by
manual selection or as the result of some filtering operations
-
Don't edit [RB] - clicking on a spot does nothing
(i.e. disable the 'click to add (remove) genes to (from) the E.G.L.'.
-
Click to add gene to E.G.L. (Ctrl/click) [RB] -
clicking on a spot adds the corresponding gene to the 'Edited
Gene List'.
-
Click to remove gene from E.G.L. (Shift/click) [RB] -
clicking on a spot removes the corresponding gene from the
'Edited Gene List'.
- Set 'Edited Gene List' to Filtered genes capture the
current Filtered genes into the E.G.L.
- Clear 'Edited Gene List'
This gives you the functionality of adding and deleting genes from a
user defined list of genes to be analyzed. The EGL may be used with
the gene-set operations discussed in Section
2.3.2. You may also define genes in the EGL using the "Gene Name
Guesser" shown in Figure 2.3.1.
Figure 2.3.1 Edited Gene List defined from the Gene Name Guesser
using wildcards. The Edited Gene List was defined as the set of
genes containing the sub-string "onco" in it. The sub-string was
specified to the popup guesser window as "*onco*" using '*' characters
as wildcard symbols indicating that it should match any or no
characters. The button Gene Name may be toggled through a set
of other identifiers including Clone ID, UniGene ID, dbEST 3', dbEST
5', GenBank 3', and GenBank 5', LocusID, etc. depending on what
identifiers are available in your database. The user then pressed the
Set E.G.L. button on the guesser window that sets the E.G.L. to
those genes. If you have enabled the View menu "Show 'edited gene
list', then the genes in the EGL. are viewed as magenta squares seen
in the pseudoarray image. You many to do additional editing to
manually add or remove genes that you want to change in the set. If a
2D scatter plot was being used, EGL labeled genes would appear there
as well. To select a particular gene as the current gene, click on the
gene you want in the list, then press the Done button.
These commands let you do comparisons of sets of genes generated under
different criteria. In addition, you may compute derived gene sets
from existing gene sets using set operations (OR, AND,
DIFFERENCE). You may also normalize the data by a gene subset. The
user may save the genes defined by: 1) by the Filter, or 2) the
manually defined 'Edited Gene List'. The gene set resulting
from a binary gene set operation OR (union), AND (intersection), or
DIFFERENCE are saved in a new named gene set. The set difference (A-B) is
defined as the gets in set A that are not in set B. Genes in set B
that are not in set A are ignored. The 'User Filter Gene Set' may be
set to any gene set and may then be used as part of the gene Filter
cascade. The 'User Normalization Gene Set' may be set to any gene set
and may then be used to normalize gene intensity values across
hybridized samples. (See normalization
algorithm for more information on this method.)
If you are running MAExplorer in stand-alone mode, the current named
gene sets are saved when you save the DB using the Save disk DB
or Save as disk DB selections in the Databases submenu of the
File menu. The gene sets are saved in a State sub directory as
".cbs" files and are used to restore the gene sets when restarting
MAExplorer on a .mae startup file. The .mae startup file saves the
names of the .cbs files that are shared among the various startup
files for a given project. The implication then is that if you change
and save a gene set in one startup database, it will change in other
startup databases when they load that gene set. The advantage is that
different startup databases may view a gene set produced by another
database.
The Sets of genes operations in the Edit menu include:
The following is an example of List saved gene sets state
listing the catalog of named gene subsets in some of the MGAP
data. Note that sets #1 to #11 are fixed by the data in the GIPO file
and may not be changed by the user. Sets #12 to #14 are assignable
from other sets or in the case of the E.G.L, by various MAExplorer
operations. Sets #1 through #14 may not be removed whereas #15 and
higher may be removed.
User Gene Sets
Set# |#genes| title
=======================
#1 |1727| ALL GENES
#2 |394| ALL NAMED GENES
#3 |246| ESTs similar to genes
#4 |456| ESTs
#5 |1096| All genes and ESTs
#6 |1681| Good genes
#7 |40| Replicate genes
#8 |0| HousekeepingGenes
#9 |96| Calibration DNA
#10 |77| Your plates
#11 |46| Empty wells
--------- User Assignable ----------
#12 |0| User Filter Gene Set
#13 |60| Edited Gene List
#14 |0| Normalization Gene Set
--------- User definable------------
#15 |60| The 60 genes closest to Carbonic Anhydrase-III
#16 |30| Named genes in the 60 genes closest to CA-III
#17 |4| Replicate genes in the 60 genes closes to CA-III
The following figure illustrates selecting sets by name for gene set
operations.
Figure 2.3.2 Selection of gene sets for binary gene set
operations. This example computes the Boolean AND of two sets "ALL
NAMED GENES" and "60 genes closest to CA-III from Named and Ests", and
then the AND of the "Replicates" with the previous result. The first
result is save in the set called "The 60 genes closest to Carbonic
Anhydrase-III". The second result is saved in the called set "Named
genes in the 60 genes closest to CA-III". Finally, the third result
is saved in the set named "Replicate genes in the 60 genes closes to
CA-III".
In addition, MAExplorer can operate on sets of hybridized samples. For
example, a sample set might be replicate hybridized samples from the
same biological experiment sample, or it could be repeated experiments
of different but the same types of samples. (One must be careful in
mixing data between the two cases because of the different expected
sources of variance). This means you can treat multiple replicate
samples as a distribution and compare the mean values for each gene in
one set of samples with the mean values for another set of samples. We
call these sets of hybridized samples conditions lists or HP
lists. You may then put one or more HP samples into a condition
set. These sets in turn can be used for computing statistics on
clonal differences between different condition sets. Note each
condition set may have multiple (i.e. different) samples. These
condition sets are saved with the user state when doing a
(File | Databases | SaveAs DB). As with sets of genes, there are a
number of operations to manipulate HP condition set in the Sets of
Conditions menu that includes:
- Choose named condition lists of samples - define or edit
new named lists of hybridized samples.
- List saved HP condition lists - list the saved HP condition lists.
- List contents of saved HP condition list - for a particular
condition.
- Save HP-X as condition list - save current HP-X 'set' to a named
HP condition list.
- Save HP-Y as condition list - save current HP-Y 'set' to a named
HP condition list.
- Save HP-E as condition list - save current HP-E 'list' to a named
HP condition list.
- Assign saved condition list to HP-X - set the current HP-X 'list'
to the saved condition list.
- Assign saved condition list to HP-Y - set the current HP-Y 'list'
to the saved condition list.
- Assign saved condition list to HP-E - set the current HP-E 'list
to the saved condition list.
- OR (Union) of 2 condition lists - make a new condition that is
the union of two named condition lists (i.e. conditions that are found
in either list).
- AND (Intersection) of 2 condition lists - make a new condition
that is the intersection of two named condition lists (i.e. conditions
that are found in both lists).
- Difference of 2 condition lists - make a new condition that is
the difference of two named condition lists (i.e. the first list less
conditions in the second list).
- Rename HP list - rename a saved HP condition list
- Load HP condition list from disk file - [Future]
- Remove HP list - remove a saved HP condition list
The following is an example of List saved HP condition lists
state listing the catalog of named HP condition lists.
Condition Lists
===============
Condition[1] #HPs 2, [Initial HP-X: C57B6 pregnancy day 13]
Condition[2] #HPs 2, [Initial HP-Y: Stat5a (-,-) pregnancy day 13]
Condition[3] #HPs 4, [Initial HP-E expression list]
The following is an example of List contents of saved HP condition
list state.
Condition List #1 [Initial HP-X: C57B6 pregnancy day 13]
====================================
HP[1] Pregnancy 13 (1 hr) [C57B6-p13-totalRNA5ug]
HP[2] Pregnancy 13 (1 hr) [C57B6-p13.2poly-A]
The Preferences submenu is used to set various data labels,
statistical limits and other parameters. These include:
The Font Family submenu is used to set
the text font family. This may be useful if your computer is missing
some fonts or some fonts are easier to read than others. Note: some
fonts may not work well on your computer. If this is the case, try
another font. When you save the data mining session with the "SaveAs
file DB", it also saves the font you have set. For some plots or popup
text-windows, you may have to regenerate the popup window to see the
font changes.
Figure 2.3.4.1 Popup window allowing you to adjust all threshold
slider values">. The Adjust all Filter threshold scrollers
command allows you to pre-adjust all threshold slider values used in
data filtering and in clustering. It may be easier to set the
approximate range before invoking the clustering operation because
changing a parameter will recluster your data.
The Define HP-X (HP-Y) class name command may be used to change
the names of the HP-X (HP-Y) experimental condition sets. These names
are used in various labels in the main window, popup plots and
reports, etc. The commands to change various names of database
components are in the Preferences submenu in the Edit menu.
