# Machine Learning Tutorial #8: Non-Parametric Analysis

## Overview

In the previous chapter, we used a t-test to determine whether there were significant differences in classification accuracies compared to chance. While this approach has been used by several researchers in the past (e.g., Li et al., 2009), the nature of the classification accuracies violates some of the assumptions of the t-test, which is a parametric test. For example, this test assumes that the values are normally distributed around zero; in the case of classification accuracies, however, all of the numbers are constrained to be positive. (A review of the problems of applying parametric methods to classification results can be found in Allefeld et al., 2016).

To remedy this, we can use other analyses. Here, we will review how to do non-parametric analyses with a toolbox called Statistical non-Parametric Mapping (SnPM).

SnPM can be downloaded by clicking on this link and clicking on `Registration`. You will be required to fill out a form with your email address. When you have finished filling out the form, you will be able to download the package. Unzip it, open a Terminal, navigate to the folder `~/spm12/toolbox/`, and type:

```mv ~/Downloads/SnPM-devel-SnPM13.1.08 SnPM13
```

Then open a Matlab terminal, click on the `Set Path` button, and click `Add Folder`. Select the folder `SnPM13` in the toolbox directory, click `Save`, and then close the window.

## Creating the SnPM Batch

From the Matlab terminal, open SPM by typing `spm fmri`. Click on `Batch`, then select `SPM -> Tools -> SnPM -> Specify -> Multisub: One-sample T-test on diffs/contrasts`. This will open a new editor window. Add the following modules as well: `SPM -> Tools -> SnPM -> Compute` and `SPM -> Tools -> SnPM -> Inference`.

From the Matlab terminal, navigate to the Haxby_Data directory, and create a new directory for our non-parametric results by typing `mkdir 2ndLevel_GroupResults_SnPM`. Select this as the Analysis Directory in the Batch Editor window, and for Images to Analyze, select the smoothed and warped res_minus_chance images, entering `swres` in the filter field to help with your search. Leave the number of permutations at 5000, and change the “Cluster inference” option from `None` to `Yes (slow)`. Leave the rest of the defaults the same.

Next, click on the `Compute` module, click the `Dependency` button, and select the configuration file from the MultiSub module.

Lastly, click on the `Inference` module, select `Dependency` and choose the output from the Compute module. From the `Type of Thresholding` menu, select `Cluster-Level Inference` and change the `Cluster-Forming Threshold` from NaN to 0.001. This will calculate an appropriate cluster-forming threshold with a voxel-wise threshold of 0.001; for more details, see this chapter. You can also choose to write a thresholded statistic image from the `Write thresholded/filtered statistic image?` to generate a statistical map that will only show those voxels that pass the cluster correction you specified. For now, we will set this to `Image name to write`, which will label the output image `SnPM_filtered`.

When you have filled out all of the modules, click the green “Go” button. Since we only have six subjects in this experiment, this should only take a few seconds. When it finishes, you will see a Maximum Intensity Projection (MIP) map, similar to what we observed when viewing the results from an fMRI experiment analyzed with SPM:

There are two major clusters that we see from this analysis: One that covers virtually all of the occipital lobes, with the highest intensity in the ventral occipital and temporal lobes, and another cluster in the left lateral frontal cortex. This replicates our previous analysis using parametric methods, which is reassuring, and it also uses a statistical approach that is more robust to violations of assumptions of normality.

## Next Steps

These non-parametric results can be reported from the MIP table, just as you would for fMRI results. If you generated the thresholded image as well, you can display this in a viewer of your choice, such as AFNI or fsleyes.

As of now (December 18th, 2020), we have not yet covered prevalence tests, which are considered the appropriate method for analyzing classification results (Allefeld et al., 2016). In the meantime, however, nonparametric tests should still be able to give you reliable results, especially if they are found in areas that were predicted by your hypothesis.