Machine Learning Tutorial #8: Non-Parametric Analysis
In the previous chapter, we used a t-test to determine whether there were significant differences in classification accuracies compared to chance. While this approach has been used by several researchers in the past (e.g., Li et al., 2009), the nature of the classification accuracies violates some of the assumptions of the t-test, which is a parametric test. For example, this test assumes that the values are normally distributed around zero; in the case of classification accuracies, however, all of the numbers are constrained to be positive. (A review of the problems of applying parametric methods to classification results can be found in Allefeld et al., 2016).
To remedy this, we can use other analyses. Here, we will review how to do non-parametric analyses with a toolbox called Statistical non-Parametric Mapping (SnPM).
SnPM can be downloaded by clicking on this link and clicking on
Registration. You will be required to fill out a form with your email address. When you have finished filling out the form, you will be able to download the package. Unzip it, open a Terminal, navigate to the folder
~/spm12/toolbox/, and type:
mv ~/Downloads/SnPM-devel-SnPM13.1.08 SnPM13
Then open a Matlab terminal, click on the
Set Path button, and click
Add Folder. Select the folder
SnPM13 in the toolbox directory, click
Save, and then close the window.
Creating the SnPM Batch
From the Matlab terminal, open SPM by typing
spm fmri. Click on
Batch, then select
SPM -> Tools -> SnPM -> Specify -> Multisub: One-sample T-test on diffs/contrasts. This will open a new editor window. Add the following modules as well:
SPM -> Tools -> SnPM -> Compute and
SPM -> Tools -> SnPM -> Inference.
From the Matlab terminal, navigate to the Haxby_Data directory, and create a new directory for our non-parametric results by typing
mkdir 2ndLevel_GroupResults_SnPM. Select this as the Analysis Directory in the Batch Editor window, and for Images to Analyze, select the smoothed and warped res_minus_chance images, entering
swres in the filter field to help with your search. Leave the number of permutations at 5000, and change the “Cluster inference” option from
Yes (slow). Leave the rest of the defaults the same.
Next, click on the
Compute module, click the
Dependency button, and select the configuration file from the MultiSub module.
Lastly, click on the
Inference module, select
Dependency and choose the output from the Compute module. From the
Type of Thresholding menu, select
Cluster-Level Inference and change the
Cluster-Forming Threshold from NaN to 0.001. This will calculate an appropriate cluster-forming threshold with a voxel-wise threshold of 0.001; for more details, see this chapter. You can also choose to write a thresholded statistic image from the
Write thresholded/filtered statistic image? to generate a statistical map that will only show those voxels that pass the cluster correction you specified. For now, we will set this to
Image name to write, which will label the output image
When you have filled out all of the modules, click the green “Go” button. Since we only have six subjects in this experiment, this should only take a few seconds. When it finishes, you will see a Maximum Intensity Projection (MIP) map, similar to what we observed when viewing the results from an fMRI experiment analyzed with SPM:
There are two major clusters that we see from this analysis: One that covers virtually all of the occipital lobes, with the highest intensity in the ventral occipital and temporal lobes, and another cluster in the left lateral frontal cortex. This replicates our previous analysis using parametric methods, which is reassuring, and it also uses a statistical approach that is more robust to violations of assumptions of normality.
These non-parametric results can be reported from the MIP table, just as you would for fMRI results. If you generated the thresholded image as well, you can display this in a viewer of your choice, such as AFNI or fsleyes.
As of now (December 18th, 2020), we have not yet covered prevalence tests, which are considered the appropriate method for analyzing classification results (Allefeld et al., 2016). In the meantime, however, nonparametric tests should still be able to give you reliable results, especially if they are found in areas that were predicted by your hypothesis.