This section briefly addresses some of the issues you need to consider. However, a full discussion of the issues involved is beyond the scope of this manual. These issues are covered in other more focused statistical methods literature and you might also address them in consultation with biostatisticians. The Internet has vast resources for microarrays. A few to get you started might include: a microarray citation electronic library http://arrayit.com/e-library/, the National Library of Medicine PubMed journal search engine, a general microarray Listserv GENE-ARRAYS@ITSSRV1.UCSF.EDU. The MGED group (Brazma, 2001) has published the MIAME standard which specifies (Minimum Information About a Microarray Experiment). This information is useful in doing an analysis. Also try searching using general Internet search engines. There are a number of public microarray data repositories. One that we find useful is NCBI's GEO (Gene Expression Omnibus), that contains array data and MIAME compliant information about the arrays.
A good and appropriate experimental design (i.e. the design and setting up of experiments to subsequently be analyzed) is critical for resolving significant differences in gene expression between experimental conditions. We touch on some of the issues here. (Simon, 2001), (Dudoit,2000), and Kerr and Churchill (2001a, 2001b) discuss some of the issues of experimental design for microarrays. We do not currently implement the Kerr-Churchill method. However, some of the issues involved in experimental design based on the types of arrays are discussed in Section 3.1.1 for (Cy3/Cy5)-labeled as well as 33P-labeled samples.
If users are comparing two different types of samples, the analysis would be different than if they were comparing an ordered sequence of samples (e.g. time series, cell cycle, dose-response, tumor-stage, etc.). MAExplorer gives users the ability to:
Briefly, data mining is the discovery of potentially interesting patterns in the data that were previously unknown. One approaches the analysis of a set of data with minimal expectations. However, some idea of what you are interested in helps focus the search. But beware of the trap of mining the data until you get the results you hope for. The following figure helps illustrate this process.
Proper experimental design of microarray experiments is critical to successful use of microarray data. Several recent reports discuss some of the key issues involved in various aspects of statistical analysis of microarrays: (Radmacher, 2001), (McShane, 2001), (Korn, 2001), (Simon, 2001), (Dudoit,2000).
a) (Cy3X/Cy5X1) / (Cy3Y/Cy5Y1) becomes b) (Cy3X/Cy3Y)However, this new comparison is accompanied by additional noise because of use of the two Cy5P intermediaries.
An alternative method would be to compute (Cy3X/Cy5Y) directly. However, this too has its own sources of error and other problems, namely that not all genes are labeled symmetrically with the two dyes since different dyes may have different sequence specific affinities due to a variety of causes. For that reason, dye-swap experiments are often done. I.e. the two samples would be run as (Cy3X/Cy5Y) as well as (Cy3Y/Cy5X). If one were to plot (Cy3X/Cy5Y) against 1.0/(Cy3Y/Cy5X) and the data were perfectly symmetric (which they are not) then one would expect a straight line. That is generally not what you get in practice.
Another issue is that when you have a number of samples A, B, C, D, ..., N and wish to compare them, there are a number of alternate experimental designs you can use with different resulting sets of advantages and problems. If a common pooled Cy5P sample P were used, then the following experiments would be done:
(Cy3A/Cy5P), (Cy3B/Cy5P), ... , (Cy3N/Cy5P)This assumes that there is enough of the pooled sample P to be used for all of the experiments - otherwise additional sources of error would be introduced. MAExplorer is ideally used with this common reference sample P. It a common pooled sample is not used, then the experimental design becomes more complicated - especially if dye-swap experiments are performed for all samples. For N samples taken 2 at a time (i.e. Cy3 and Cy5), then the number of experiments may be impossibly large to perform for other than a very small N. Eg. for N of 3, the number of experiments is 3 and 6 if dye swap experiments are also performed. For N of 4, the number of experiments is 6 and 12. And this is without doing any replicate experiments. If a reasonable number of replicates is added, then this set of experiments becomes even difficult to perform.
MAExplorer is currently not oriented to handling these large combinatoric types of non-pooled sets of experiments. However, you do have the ability to swap (Cy3,Cy5) data on an individual basis so you could compute an average of data from dye-swap experiments - but with the caveats or non-uniform labeling mentioned above.
[(Cy3X/Cy5Y) + 1.0/(Cy3Y/Cy5X)]/2In general, this is probably not a very good estimate.
Direct user manipulation of data, as incorporated in MAExplorer, was defined by (Schneiderman, 1997) who defends the position that the direct manipulation of data in data mining is an extremely effective means to amplify human creativity in understanding patterns. Schneiderman's dogma states "overview first, zoom, and then filter details on demand" and favors the use of "shallow search trees, slide controllers, and information-right screens with tightly coordinated panel view of data", (Beardsly, 1999). MAExplorer also uses many of these direct manipulation principles. It was designed to run on the desktop computers with data residing on the same computer and loaded into its memory for rapid direct manipulation - for both the Web browser and stand-alone versions.
Part of the Flicker system allows comparison of user 2D gel images with standard images from SWISS-2DPROT for putative identification of unknown spots in the user gels. The user would select a standard 2D gel image from over 20 tissue types, enter their own 2D gel image and align them at spots of interest. They could then switch to a database access mode, click on those spots and generate popup SWISS-2DPROT Web pages for those proteins - similar to Clone reports in MAExplorer. That is accessed at http://www.lecb.ncifcrf.gov/flicker/swissProtIdFlkPair.html.
MAExplorer will have a groupware facility similar to what we have done with our WebGel (http://www.lecb.ncifcrf.gov/webgel/) system described in (Lemkin et al., 1999b). It is a two-dimensional electrophoresis system for sharing data analyses. In WebGel, users may perform a data-mining analysis and leave the state of the their analysis and accompanying notes to share with their collaborators on a login-protected basis.
We now discuss using these tools for analyzing ones data.
Table 3.2 Steps in a data-mining analysis.
|
In designing a data mining experiment, the first decision to be made is selecting the set of hybridized samples to be compared (steps 1 and 2). This is accomplished by setting the current hybridized sample-X (HP-X) and hybridized sample-Y (HP-Y). In Figure 2.4.4.2 for the scatter plot we selected a single C57B6 pregnancy day 13 and a single Stat5a (-,-) pregnancy day 13 as current HP-X and current HP-Y samples. Changing the normalization changes the view in the scatter plot so that hidden differences may be more apparent (see Figure 2.4.2.3)
The names of the current HP-X and HP-Y samples are displayed at the top of the main window. The current HP-X and HP-Y samples may be changed at any time by clicking on a new sample from a list of samples shown on the left side of the main window or from lists of samples organized by sample population in the Samples menu.
The next decision to be made is selection of the genes to be studied by choosing a subset from the gene class menu list (step 4). Further selection occurs throughout the analysis by clicking on spots in microarray images, points in graphic plots or cells in spreadsheets, by adjusting threshold sliders, or using the text-entry "guesser" to type in gene names, clone IDs, genomic IDs, samples, etc.
The next decision the user must make is to set the intensity data normalization mode (step 3). Normalization of quantitative data is crucial when comparing data between different hybridized microarrays because of spotting, hybridization efficiency, uniformity, and other systematic errors.
Genes of interest may be separated for all of the genes in the database using a cascade of data filters (step 4). Additional filtering options are easily accessible in the (data) Filter menu. Some of the filters require additional parameters. These parameters are set by state scroll bars that pop-up on the screen when data filters requiring them are added to the filter cascade. Changing scroller values causes the data filter to be automatically be reapplied and a new set of genes to be computed.
It is desirable to reduce false-positives found by the data filter by eliminating genes with high quantification variability between duplicate spots on the same sample or spot duplicated in replicate samples. If duplicate genes are available on the array (denoted by Field 1 and Field 2 or F1 and F2 spots), this allows the computation of a coefficient of variation (CV) for the duplicates. This CV may be used in a data filter to reduce potential false-positives. CV is computed as 2|F1-F2|/(F1+F2) using those spot values for each gene, as StdDevHP/MeanHP for a set of replicate hybridized samples.
Graphical views of the data give the user additional insights into the data. These include spot intensity and ratio or Zdiff pseudoarray images, scatter plots, histogram plots, expression profile plots, cluster plots showing genes similar to a specified gene, the number of clustered genes for each clone, divisive clusters for K-means clustering, and clustergrams and dendrogams for hierarchical clustering.
When there are too many EP-plots to be viewed simultaneously, you might use a scrollable list of expression profile plots that lets you scroll through an arbitrarily large list of genes. However, it is difficult to compare genes that are not sorted in some way (i.e. clustered). Therefore, these are most useful when used after clustering the data and displaying the scrollable EP-plots of the cluster-order data.
Clustering is one way of possibly finding co-expressed genes that exhibit similar expression changes in a set of samples. Genes may show similar co-expression, but that does not prove they are co-regulated at the same point in a pathway - merely that measurements of those genes in a particular set of experiments show similar expression. However, identifying genes with similar expression for which some information is already known about some of the genes may be useful as a starting point to help figure out gene function and pathway using additional experiments and analysis.
There are many methods for doing clustering - each with advantages and disadvantages. We present three methods in MAExplorer and plan on adding a variety of more powerful methods through the MAEPlugin facility under development.
The first cluster method finds a cluster of genes whose expression profiles are similar to that of the currently selected gene. This list of genes is restricted by the constraint that the cluster distance between each of these genes to the selected gene is less than the "Cluster threshold" distance set by the user with a scroll bar. It displays genes that are found both with blue boxes (the larger the box, the higher the similarity) and in a text report window showing the genes and their distances to the current gene. By varying the threshold and observing the results, the user can find a set of highly correlated genes. If the threshold is set to 0.0, no genes are found. If it is set too high, all data filtered genes are found. So it is critical to adjust the threshold to a reasonable level commensurate with the type of data being analyzed and the approximate number of genes expected.
A second cluster method draws blue circles in the array image around all filtered genes meeting the threshold criteria, where the larger the circle the larger the number of similar genes (i.e. passing the threshold) are found to be clustered with that gene. Clicking on a gene toggles between the first and second methods. For both of these methods, it will pop-up a "Cluster Distance" threshold scroller and recomputes the clusters if you change the scroller value or the current gene. It also shows a text report that displays the number of genes similar to each data filtered gene.
A third method called "K-means" clustering K genes (we call primary nodes) whose expression profiles are most orthogonal to each other. It uses the current gene as the first or "seed" node. It then finds the gene furthest from this and assigns it as node 2. Then the gene furthest from both nodes 1 and 2 is assigned to node 3, etc. This process is repeated until all K nodes are assigned. Then the remaining genes are assigned to the closest node. Having defined the initial cluster centers, it recomputes the centroid of each of the clusters. The centroid can alternatively be computed using a median instead of a mean in which case we would be doing K-median clustering (Bickel, 2001). K genes are then reassigned to the nearest new centroids as the new K-means node instances. Finally, the remaining genes are assigned to the nearest centroid. A scrollable K-means cluster text window report pops up with genes sorted by cluster. Clicking on a gene in either the array image or scatter plot assigns all genes in the cluster to which that gene belongs to the "current cluster". Genes in the current cluster are labeled in the array and scatter plot with a small number of the cluster. In addition, genes in the current cluster are copied to the E.G.L. where they can be used in a report, saved in a named gene set, or used for additional filtering. It also pops up a "N-clusters" scroll bar window to let you dynamically adjust the number of clusters. Changing N will recompute the clusters. When the K-means is recomputed, it uses the current gene as the initial seed gene.
The fourth method is a hierarchical clustering method that generates a clustergram and dendrogram similar to that of Eisen's red-black-green clustergram (Eisen, 1998). This was derived from the clustered correlation map (ClusCor) of Weinstein et al. (Weinstein, 1997). The MAExplorer clustergram and dendrogram are dynamic and may be interrogated and used to set the current gene. This means that it may also position a corresponding ordered list of expression profile plots to the same gene so you may view the data as a plot as well. The dendrogram may be zoomed in to explore a part of the dendrogram in more detail. As with the K-means clustering, a report can be made of the ordered genes.
Then, the expression profile is expressed as a list of values:
ej = (vj1, vj2, vj3, ..., vjN)A difference between two genes p and q may be estimated as a N-dimensional metric "distance" between ep and eq. The Euclidean distance is then defined as
dpq = (1/N SUMj=1:N (vjp - vjp)2 )1/2Other distance measures may include correlation coefficient, city-block (or manhatten distance) etc.
For scaled data such that dpq has a maximum value of 1.0 ovger all samples. A similarity measure could be computed as 1.0 - distance or
spq = 1 - dpq
djs < TThe threshold T is set by the investigator and in MAExplorer is changed using a slider. Typically, the set of all genes {gj} found is sorted by similarity before being viewed.
Algorithm:
|
D can get quite large for clustering a large number of genes N [for N=5000, this is > 50 Mbytes!]
The following is a simplified definition of one way to compute a hierarchical clustering of gene expression profile data.
Algorithm:
|
[<field>-<grid name><row#>,<col#>]. e.g. [1-A4,3]
If there is only one field in the array, it will appear as field 1. In the above example, [1-A4,3] is field 1 grid A row 4 and column 3. Note that the pseudoarray coordinates are for visualization purposes in MAExplorer and may or may not be the same as the coordinates on the actual array. That depends on how the MAExplorer database was defined in the configuration file described in Appendix C.
When the current gene is defined, it will draw a yellow (green) circle around the spot in the ratio (intensity) pseudoarray image and display other features of the gene in the three-line status area near the top of the main window. If background correction is enabled (the "Use background intensity correction" in the Normalization menu), then spot intensity values will appear as intensity' (with background intensity subtraction) and intensity (without background subtraction).
There are a number of different reporting formats available depending on the array display mode and particular normalization method selected. These include: the pseudoarray image of the intensity of a single sample, the pseudocolor ratio X/Y or Zdiff (X-Y) image (using either HP 'sets' or single samples), or the ratio of Cy3/Cy5 for dual-labeled dyes or F1/F2 for replicate spots for a single sample. In addition, the normalization mode is also displayed in the reporting line. We will present examples of each of these different reporting formats.
You may show the intensity data for a particular spot in the currently displayed pseudoarray image. First select the "Pseudograyscale image" option in the "Show Microarray" submenu in the "Plot menu". If your data has duplicate grids (i.e. fields F1 and F2) then you may look at F1, F2 and mean (F1+F2)/2 data in the reports when you click on a spot. If the "Gang F1-F2 scrolling" switch is disabled in the "View menu", then the intensity value is the intensity data value for the gene at that location. If the "Gang F1-F2 scrolling" switch is enabled, then it reports intensity[F1], intensity[F2], and the F1/F2 ratio. These two formats are shown in the following two examples for a C57B6 pregnancy day 13 samples in the MGAP database:
a) Field F1 spot for a single spot in a single sample with the median intensity selected.
[1-A4,5] intensity=4.5267, (Norm.: median intensity) CloneID: 1248228, dbEST3': 2279072, GenBankAcc3': AI463183, UniGene: Mm.13859, plate[5,A,5] GeneName: Mus musculus ribosomal protein L41 mRNA, complete cdsb) Field F1 and F2 replicate spots for a single sample. The top line is shown for each of the different normalization methods.
[1-A4,5] intensity[F1]=-0.3067, intensity[F2]=-0.2312, F1-F2=-0.0755, (Norm.: Zscore intensity) [1-A4,5] intensity[F1]=4.5267, intensity[F2]=6.2408, F1/F2=0.7253, (Norm.: median intensity) [1-A4,5] intensity[F1]=0.8755, intensity[F2]=1.1457, F1-F2=-0.2701, (Norm.: log median intensity) [1-A4,5] intensity[F1]=-0.1442, intensity[F2]=-0.0945, F1-F2=-0.0497, (Norm.: Z-score, stdDev, log intensity) [1-A4,5] intensity[F1]=-0.1533, intensity[F2]=-0.1004, F1-F2=-0.0528, (Norm.: Z-score, mean abs.deviation, log intensity) [1-A4,5] intensity[F1]=630.9911, intensity[F2]=869.9273, F1/F2=0.7253, (Norm.: calibration DNA intensity) [1-A4,5] intensity[F1]=1919.9376, intensity[F2]=2646.957, F1/F2=0.7253, (Norm.: scale to max. (65K) intensity) CloneID: 1248228, dbEST3': 2279072, GenBankAcc3': AI463183, UniGene: Mm.13859, plate[5,A,5] GeneName: Mus musculus ribosomal protein L41 mRNA, complete cdsIf the "Pseudocolor HP-X/HP-Y ratio or Zdiff" option is selected in the "Show Microarray" submenu, data is reported as either Ratio or Zdiff data depending on the normalization method selected. The data used in the following examples is for C57B6 pregnancy day 13 (HP-X) compared with Stat5a (-,-) pregnancy day 13 (HP-Y).
c) Ratio data for two samples X and Y in separate hybridized arrays. Ratio data for the field F1 and F2 spot data as well as the mnX/mnY ratio is reported. The median normalization was used in this example.
[1-A4,5] HP-XY: mn(X,Y)=(5.383,6.834) (X/Y)(F1,F2,mean)=(0.651,0.928,0.787), (Norm.: median intensity) CloneID: 1248228, dbEST3': 2279072, GenBankAcc3': AI463183, UniGene: Mm.13859, plate[5,A,5] GeneName: Mus musculus ribosomal protein L41 mRNA, complete cdsd) Zdiff data for two separate samples X and Y. Ratio data for the field F1 and F2 spot data as well as the mnX-mnY Zscore difference is reported. The three Zscore, ZscoreLog, and logMean normalizations were used in this example (first lines are shown).
[1-A4,5] HP-XY: mn(X,Y)=(-0.269,0.151) (X-Y)(F1,F2,mean)=(-0.470,-0.370,-0.420), (Norm.: Zscore intensity) [1-A4,5] HP-XY: mn(X,Y)=(-0.119,0.051) (X-Y)(F1,F2,mean)=(-0.199,-0.142,-0.170), (Norm.: Z-score, stdDev, log intensity) [1-A4,5] HP-XY: mn(X,Y)=(1.010,1.224) (X-Y)(F1,F2,mean)=(-0.362,-0.064,-0.213), (Norm.: log median intensity) CloneID: 1248228, dbEST3': 2279072, GenBankAcc3': AI463183, UniGene: Mm.13859, plate[5,A,5] GeneName: Mus musculus ribosomal protein L41 mRNA, complete cdse) Example of when the "Use dual HP-X & HP-Y Pseudoimage" mode is enabled in the "Show Microarray" submenu of the "Plot" menu. This displays mean data for the HP-X and HP-Y data side-by-side. The median normalization was selected.
[1-A4,5] intensity[X]=5.3837, intensity[Y]=6.8342, X/Y=0.7877, (Norm.: median intensity) CloneID: 1248228, dbEST3': 2279072, GenBankAcc3': AI463183, UniGene: Mm.13859, plate[5,A,5] GeneName: Mus musculus ribosomal protein L41 mRNA, complete cds
f) Multiple HP-XY 'sets' using median normalization for the pseudoarray image display for the HP-X 'set' of three C57B6 samples.
[1-A4,5] HP-X 'set' mean intensity=3.295 stdDev=1.482 CV=0.449 n=3, (Norm.: median intensity) CloneID: 1248228, dbEST3': 2279072, GenBankAcc3': AI463183, UniGene: Mm.13859, plate[5,A,5] GeneName: Mus musculus ribosomal protein L41 mRNA, complete cdsg) Multiple HP-XY 'sets' using median normalization for the pseudoarray image display for the HP-Y 'set' of five Stat5a (-,-) samples.
[1-A4,5] HP-Y 'set' mean intensity=8.180 stdDev=0.986 CV=0.120 n=5, (Norm.: median intensity) CloneID: 1248228, dbEST3': 2279072, GenBankAcc3': AI463183, UniGene: Mm.13859, plate[5,A,5] GeneName: Mus musculus ribosomal protein L41 mRNA, complete cdsh) Multiple HP-XY 'sets' using median normalization for the pseudoarray image display for the HP-X and HP-Y 'sets' when the "Use dual HP-X & HP-Y Pseudoimage" mode is enabled in the "Show Microarray" submenu of the "Plot" menu.
[1-A4,5] HP-XY 'sets': mn(X,Y)=(3.295,8.180) mnX/mnY=0.402 SD(X,Y)=(1.482,0.986) CV(X,Y)=(0.449,0.120)\ n(X,Y)=(3,5), (Norm.: median intensity) CloneID: 1248228, dbEST3': 2279072, GenBankAcc3': AI463183, UniGene: Mm.13859, plate[5,A,5] GeneName: Mus musculus ribosomal protein L41 mRNA, complete cdsi) Multiple HP-XY 'sets' using median normalization for ratio (HP-X/HP-Y) data for the "Pseudocolor HP-X/HP-Y Ratio or Zdiff" display.
[1-A4,5] HP-XY 'sets': mn(X,Y)=(3.295,8.180) mnX/mnY=0.402 SD(X,Y)=(1.482,0.986) CV(X,Y)=(0.449,0.120) \ n(X,Y)=(3,5), (Norm.: median intensity) CloneID: 1248228, dbEST3': 2279072, GenBankAcc3': AI463183, UniGene: Mm.13859, platey[5,A,5] GeneName: Mus musculus ribosomal protein L41 mRNA, complete cds
j) Multiple HP-XY 'sets' p-value using median normalization for ratio (HP-X/HP-Y) data for the "Pseudocolor (HP-X,HP-Y) 'sets' p-value display.
[1-A7,20] HP-XY: mn(X,Y)=(3.449,0.853) (X/Y)(F1,F2,mean)=(4.09,4.008,4.041), (Norm.: median intensity) CloneID: 1382656, dbEST5': 1775754, GenBank 5': AI036495, UniGene: Mm.300, plate[12,A,8] GeneName: Carbonic anhydrase 3
[1-A6,11] Cy5/Cy3=0.3588, Cy5=67.324, Cy3=187.622, (Norm.: median intensity) CloneID: IMAGE:1054189, GeneName: expressed sequence AW213287
[1-A5,16] intensX=4.695, intensY=5.923, (X-Y)=-1.2275, (Norm.: log median intensity) CloneID: IMAGE:963758, GeneName: RIKEN cDNA 2410114O14 gene
For the intensity and ratio threshold filters, the range interpretation may be inside, or outside the specified range. The ratio range [R1:R2] is between 0.01 and 100.0. The Zdiff range [Z1:Z2] and [CZ1:CZ2] are between -4.0 and +4.0. The intensity threshold range [I1:I2] is set to the dynamic range of the min and max intensity for the current normalization method.
A list of possible threshold sliders is shown in the following table. When a Filter is enabled that requires a slider, it pops up the State Scrollers window that contains one or more slides. When you disable all filters that use these sliders, the popup window will disappear. The corresponding Ratio R1[R2] or Zdiff Z1[Z2] sliders are used if you are using a ratio or Zscore normalization - and will change if the normalization changes while the filter is active.
Some of the sliders are implemented with a non-linear scale so that you have more resolution at the low end (eg. p-Value, Spot CV, Diff HP-XY).
Depending on the set of data Filters selected, there may be multiple sliders present in the State Slider popup window (eg. see Figure 2.4.3).
Table 3.3.1. List of threshold sliders. Sliders are enabled in the State-Scroller popup window when the corresponding data filters are enabled.
Slider name | Associated with operation |
---|---|
Spot Intensity SI1 | Filter by spot intensity range per channel |
Spot Intensity SI2 | Filter by spot intensity range per channel |
Percent SI OK | Filter by percent of spots whose spot intensity is in threshold range criteria meets the AT LEAST or AT MOST criteria |
Intensity I1 | Filter by gene intensity range |
Intensity I2 | Filter by gene intensity range |
Ratio R1 | Filter by gene X/Y ratio range |
Ratio R2 | Filter by gene X/Y ratio range |
Zdiff Z1 | Filter by gene X-Y Zdiff range |
Zdiff Z2 | Filter by gene X-Y Zdiff range |
Ratio CR1 | Filter by Cy3/Cy5 gene X/Y ratio range |
Ratio CR2 | Filter by Cy3/Cy5 gene X/Y ratio range |
Zdiff CZ1 | Filter by gene (Cy3-Cy5) X-Y Zdiff range |
Zdiff CZ2 | Filter by gene (Cy3-Cy5) X-Y Zdiff range |
p-Value | Filter by t-Test |
Spot CV | Filter by Coefficient of Variation |
Cluster Distance | Plot - cluster by expression similarity |
# of Clusters | Plot - K-means clustering |
Diff HP-XY | Filter by absolute difference (HP-X,HP-Y) |
Spot Quality | Filter by continuous spot quality (If data available) |
If you are running on a windowing system supporting cut and paste, then you may cut and paste data from reports and plots into applications on your system that allow you to save or print this data. Set the Report menu table-format to "Tab-delimited". Then, in Windows 95/98/NT/2000/XP, cut data from the popup tables (or other text reports) and paste it into Microsoft Excel. In Windows, you can capture (i.e. "cut") the entire screen by pressing the "Prt Sc" or print screen button. To capture a specific window (e.g. a scatter plot), hold the "Alt" key when pressing the "Prt Sc" key. Then go into a Windows imaging application (such as PhotoShop) and paste it into the application. In PhotoShop, in the File menu, select New (or type Control/N). Then when the window is opened, click on the window and paste the MAExplorer screen you had cut into the image window by typing Control/V. In both Excel and PhotoShop you may print the data or save it in a file.