"Do we need to [statistically] model everything?

Returning to an index approach allows true hypothesis testing in complex timeseries"

For most of the history of psychology, data analysis has focused on techniques like ANOVA and regression, characterize the observed structure in a dataset with respect to a given set of questions or factors. Such models make simplifying assumptions about the data as a while, while allowing targeted questions about the observed differences. However, in the last decade the field has moved to models starting from a different set of assumptions. These models characterize the underlying probability structure that gave rise to all of the data, and then use this to make experimental inferences. Nowhere is this contrast more apparent than in the analysis of rich timeseries data such as those generated by techniques like eye-tracking, pupillometry, EEG and so forth. This talk explores this contrast in the context of a popular technique in psycholinguistics and language acquisition: the visual world paradigm, which uses eye-movements in a semi-naturalistic task to make inferences about the millisecond-by-millisecond dynamics of language processing. Here, the auto-correlated timeseries and the incredibly rich dataset seems to appear to require highly complex approaches to analysis such as generalized additive models and growth curves that can fully capture this variance. But is this necessary? This talk challenges this approach on two grounds. First, I present a series of Monte-Carlo simulations that ask what is the nature of the timeseries that is the basis of analysis? While most approaches (including my own) simply average the series of fixations across hundreds of trials, my simulations suggest that the fact that these curves drive from series of discrete fixations may have profound consequences for how closely they align to the underlying decision process in the language system. Without understanding the eye-movement system (and even with it), it may not be safe to assume that these common visualizations—the basis of most complex statistical approaches—reflect the fine-grained dynamics of language. I build from this to argue for a return to an older approach: inference or index approaches to analysis. I illustrate how these can be applied to such data, describe advances that push us beyond simple ANOVA, and present power and reliability analyses from the Monte Carlos that suggest they can be used with no loss of power or increase in type I error. Crucially, these approaches may permit stronger and clearer hypothesis-driven analysis that more complex approaches.