# Differential Expression (Probeset Level)

*Array Studio* contains a number of different modules for performing univariate analysis/differential expression on the probeset level, including One-Way ANOVA, Two-Way ANOVA, and the more advanced General Linear Model, as well as a few others. For probeset level, the differential expression analysis is similar to that discussed in MicroArray Tutorial. We will only provide an example of **General Linear Model** in this tutorial.

## Probeset Level Linear Model

The design of the experiment in this tutorial is set-up so that the user should perform a *Probeset Level* *Linear Model*. The first factor in the ANOVA is *tissue_type* while the second factor is *patient_id*. For each patient, there is a tumor and a normal sample, and we are interested in the difference between the two.

To run the *Probeset Level Linear Module*, go to the **Statistical Inference** section of the **workflow**, and select **Probeset Level Linear model**. Alternatively, the same module can be selected by going to the **MicroArray Menu | Inference | General Linear Model**.

This opens the *General Linear Model* window.

As with other analysis windows, the user must first set the *Project* and *Data* on which to run the analysis, in the *Input/Output* section. Make sure **Tutorial ExonArray** is chosen as the project and *Exon Data* is chosen as the input data.

For *Variables*, choose **Customized variables** and click **Select**. Choose the list that was generated earlier by the *Filter* command.

For *Observations*, choose **Customized Observations**, and then click the **Select** button to choose the list **ExonData.Observation19**.
This ensures that the statistical tests are only run on the good 19 observations, ignoring the one outlier chip.

Go to **Step 1: Specify Model**.

The two factors in this model are **tissue_type** and **patient number**. Use ctrl + click to select both of them and click the **Add** button.

*Patient* is random effect, so click the **Random** checkbox for **patient number**.
Click **OK** to return to the **General Linear Model** window.
Notice that the information of the specified model is displayed in the box under step 1.

Next, click **Specify Test** for comparisons.

This opens the *Specify Test* window, which allows the user to manually or automatically specify the tests (or comparisons). In this case, the user is interested in the difference between tumor samples and normal.

The easiest way to specify the comparison is to ensure that the **Term** box is set to **tissue_type**, click the **For each** box to set to **(none)**, and set **Compare to** as **Normal**.
In effect, this says that for every level of *tissue_type*, compare it to normal.
Since there are only two levels (tumor and normal), there will be one comparison.

Make sure that **Estimate**, **Fold Change**, **Raw p-values** and **Adjusted p-values** are checked, and then click **Add** to add the test. Add test will be displayed in the **TTests** box.

Click OK to return to the original *General Linear Model* window.

*Step 3* is optional, and includes a number of options that can be set for the *General Linear Model*. Please refer to MicroArray Tutorial for more details on the options.

The *Linear Model* option is now complete.
If the user is familiar with SAS code, clicking *Show SAS Code* will show the equivalent SAS code.

Click **Submit** to run the module.

This module should take approximately 6 minutes (Note: the length of time is dependent on the number of variables in this case over 1 million, as well as the type of model).

## The Volcano Plot View and Inference Report

After running the *General Linear Model* (the computing time should be a few minutes), a *Table* is generated under the *Inference* tab of the *Solution Explorer*, named *ExonData.Tests*.
This table contains the statistics report generated by the General Linear Model, together with a *VolcanoPlot* visualizing the pvalues vs. estimate.

Also notice that a new *List* has been automatically generated by the *General Linear Model*. This *List* can be used for purposes of filtering, and any other downstream analysis. However, for this experiment, there are actually no probesets that pass the adjusted p-value criteria of 0.05, so this list contains 0 probesets.

Double click on *VolcanoPlot* to open it. Notice that one *volcano plot* has been created in this view, for the comparison *Tumor vs. Normal*.

The *VolcanoPlotView* shows the *-Log10 (Raw P-value)* on the y-axis and the *Estimate* (*Estimate* is defined as the statistically adjusted difference between the means of the two groups being compared) on the x-axis. Thus, the most significant probesets are higher on the y-axis, while the mostly differentially expressed probesets can be found at the extremes of the x-axis. Similar to all views in Array Studio, the *VolcanoPlotView* is fully interactive. Please refer to MicroArray Tutorial for more details on these options.