2.3 Edit menu

The Edit menu operations include operations to modify the 'edited gene list' that is set from a variety of Filters as well as manually this menu. The user may perform set operations (union, intersection, and difference) on named sets of gene and sets of sample experiments (conditions). User preferences are also set in this menu.

Sets of genes or HP condition lists are very useful for tracking complex data-mining sequences of analysis. For example, derived named gene sets may be used in successive data filters and for reports. For example, one could do the following experiment given four different types of HPs for (e.g. virgin, pregnancy, lactation, and involution)
First compare two HPs using a statistical test such as a t-test. Then save the resulting set of genes under the name "virgin vs. pregnancy". Then compare the next two HPs and save the resulting genes under the name "lactation vs. involution". Finally, compute the difference of genes found in "virgin vs. pregnancy" that are not found in "lactation vs. involution". This resulting gene set could then be saved (e.g. with a name "Genes found in virgin vs. pregnancy, but not in lactation vs. involution"). Similarly, taking the intersection of these two named sets shows genes that are common between the two sets. Taking the union shows genes found in either of the two named sets.

The Edit menu contains the following main selections. All of these entities and preferences are saved as part of the startup state when you do a (File | Databases | SaveAs ... DB).

2.3.1 User edited gene list - the 'Edited Gene List' menu

You may define and edit arbitrary sets of genes using the User edited gene list submenu to modify the 'Edited Gene List' (EGL). This has sub-modes of operation for adding or removing genes from the image by clicking on spots. If the Show 'Edited Gene List' mode is set, you may see exactly which genes you have defined by the magenta squares drawn around each gene in the EGL. Many of the clustering operations will leave the current cluster in the EGL. The commands include:

This gives you the functionality of adding and deleting genes from a user defined list of genes to be analyzed. The EGL may be used with the gene-set operations discussed in
Section 2.3.2. You may also define genes in the EGL using the "Gene Name Guesser" shown in Figure 2.3.1.

Gene 'Guesser' used to find a particular gene or subset of genes by key phrase

Figure 2.3.1 Edited Gene List defined from the Gene Name Guesser using wildcards. The Edited Gene List was defined as the set of genes containing the sub-string "onco" in it. The sub-string was specified to the popup guesser window as "*onco*" using '*' characters as wildcard symbols indicating that it should match any or no characters. The button Gene Name may be toggled through a set of other identifiers including Clone ID, UniGene ID, dbEST 3', dbEST 5', GenBank 3', and GenBank 5', LocusID, etc. depending on what identifiers are available in your database. The user then pressed the Set E.G.L. button on the guesser window that sets the E.G.L. to those genes. If you have enabled the View menu "Show 'edited gene list', then the genes in the EGL. are viewed as magenta squares seen in the pseudoarray image. You many to do additional editing to manually add or remove genes that you want to change in the set. If a 2D scatter plot was being used, EGL labeled genes would appear there as well. To select a particular gene as the current gene, click on the gene you want in the list, then press the Done button.


2.3.2 Sets of genes menu

These commands let you do comparisons of sets of genes generated under different criteria. In addition, you may compute derived gene sets from existing gene sets using set operations (OR, AND, DIFFERENCE). You may also normalize the data by a gene subset. The user may save the genes defined by: 1) by the Filter, or 2) the manually defined 'Edited Gene List'. The gene set resulting from a binary gene set operation OR (union), AND (intersection), or DIFFERENCE are saved in a new named gene set. The set difference (A-B) is defined as the gets in set A that are not in set B. Genes in set B that are not in set A are ignored. The 'User Filter Gene Set' may be set to any gene set and may then be used as part of the gene Filter cascade. The 'User Normalization Gene Set' may be set to any gene set and may then be used to normalize gene intensity values across hybridized samples. (See normalization algorithm for more information on this method.)

If you are running MAExplorer in stand-alone mode, the current named gene sets are saved when you save the DB using the Save disk DB or Save as disk DB selections in the Databases submenu of the File menu. The gene sets are saved in a State sub directory as ".cbs" files and are used to restore the gene sets when restarting MAExplorer on a .mae startup file. The .mae startup file saves the names of the .cbs files that are shared among the various startup files for a given project. The implication then is that if you change and save a gene set in one startup database, it will change in other startup databases when they load that gene set. The advantage is that different startup databases may view a gene set produced by another database. The Sets of genes operations in the Edit menu include:

The following is an example of List saved gene sets state listing the catalog of named gene subsets in some of the MGAP data. Note that sets #1 to #11 are fixed by the data in the GIPO file and may not be changed by the user. Sets #12 to #14 are assignable from other sets or in the case of the E.G.L, by various MAExplorer operations. Sets #1 through #14 may not be removed whereas #15 and higher may be removed.

   User Gene Sets
   Set# |#genes| title
   =======================
    #1 |1727| ALL GENES
    #2 |394|  ALL NAMED GENES
    #3 |246|  ESTs similar to genes
    #4 |456|  ESTs
    #5 |1096| All genes and ESTs
    #6 |1681| Good genes
    #7 |40|   Replicate genes
    #8 |0|    HousekeepingGenes
    #9 |96|   Calibration DNA
    #10 |77|  Your plates
    #11 |46|  Empty wells
   --------- User Assignable ----------
    #12 |0|   User Filter Gene Set
    #13 |60|  Edited Gene List
    #14 |0|   Normalization Gene Set
   --------- User definable------------
    #15 |60|  The 60 genes closest to Carbonic Anhydrase-III
    #16 |30|  Named genes in the 60 genes closest to CA-III
    #17 |4|   Replicate genes in the 60 genes closes to CA-III

The following figure illustrates selecting sets by name for gene set operations.

Gene subsets may be computed from other gene subsets using Boolean operations

Figure 2.3.2 Selection of gene sets for binary gene set operations. This example computes the Boolean AND of two sets "ALL NAMED GENES" and "60 genes closest to CA-III from Named and Ests", and then the AND of the "Replicates" with the previous result. The first result is save in the set called "The 60 genes closest to Carbonic Anhydrase-III". The second result is saved in the called set "Named genes in the 60 genes closest to CA-III". Finally, the third result is saved in the set named "Replicate genes in the 60 genes closes to CA-III".


2.3.3 Sets of sample conditions menu

In addition, MAExplorer can operate on sets of hybridized samples. For example, a sample set might be replicate hybridized samples from the same biological experiment sample, or it could be repeated experiments of different but the same types of samples. (One must be careful in mixing data between the two cases because of the different expected sources of variance). This means you can treat multiple replicate samples as a distribution and compare the mean values for each gene in one set of samples with the mean values for another set of samples. We call these sets of hybridized samples conditions lists or HP lists. You may then put one or more HP samples into a condition set. These sets in turn can be used for computing statistics on clonal differences between different condition sets. Note each condition set may have multiple (i.e. different) samples. These condition sets are saved with the user state when doing a (File | Databases | SaveAs DB). As with sets of genes, there are a number of operations to manipulate HP condition set in the Sets of Conditions menu that includes:

The following is an example of List saved HP condition lists state listing the catalog of named HP condition lists.

   Condition Lists
   ===============
   Condition[1] #HPs 2, [Initial HP-X: C57B6 pregnancy day 13]
   Condition[2] #HPs 2, [Initial HP-Y: Stat5a (-,-) pregnancy day 13]
   Condition[3] #HPs 4, [Initial HP-E expression list]

The following is an example of List contents of saved HP condition list state.

   Condition List #1 [Initial HP-X: C57B6 pregnancy day 13]
   ====================================
   HP[1] Pregnancy 13  (1 hr) [C57B6-p13-totalRNA5ug]
   HP[2] Pregnancy 13  (1 hr) [C57B6-p13.2poly-A]

2.3.4 Setting user preferences menu

The Preferences submenu is used to set various data labels, statistical limits and other parameters. These include:

The Font Family submenu is used to set the text font family. This may be useful if your computer is missing some fonts or some fonts are easier to read than others. Note: some fonts may not work well on your computer. If this is the case, try another font. When you save the data mining session with the "SaveAs file DB", it also saves the font you have set. For some plots or popup text-windows, you may have to regenerate the popup window to see the font changes.


Popup window allowing you to adjust all threshold slider values

Figure 2.3.4.1 Popup window allowing you to adjust all threshold slider values">. The Adjust all Filter threshold scrollers command allows you to pre-adjust all threshold slider values used in data filtering and in clustering. It may be easier to set the approximate range before invoking the clustering operation because changing a parameter will recluster your data.



Dialog query to change HP-X Condition set name

The Define HP-X (HP-Y) class name command may be used to change the names of the HP-X (HP-Y) experimental condition sets. These names are used in various labels in the main window, popup plots and reports, etc. The commands to change various names of database components are in the Preferences submenu in the Edit menu.