"Syntactic generalization in artificial neural networks"Abstract:
Systems based on artificial neural networks have proved to be highly effective in applications such as machine translation. These advances are surprising from a cognitive point of view: these networks are not designed with the structural inductive biases that are often viewed as necessary for the acquisition of syntax. Yet success in applications speaks to cognitive questions in an indirect way at best, as engineering benchmarks do not reflect the nuanced generalizations that have motivated the adoption of structural inductive biases. In this talk, I will use established psycholinguistic paradigms to examine the syntactic generalization capabilities of artificial neural networks, and compare the networks' behavior to that of humans.
I will show that recurrent neural networks trained to predict upcoming words can overcome their sequential bias and acquire structure-sensitive dependencies such as subject-verb agreement; however, the networks' performance degrades sharply on more complex sentences. I will then demonstrate how networks with explicit syntactic structure can be used to test for the necessity (and sufficiency) of structural inductive biases, focusing on the classic case of subject-auxiliary inversion in English question formation. Overall, I will conclude that explicit structural biases are still necessary for human-like generalization in a learner trained on text only.