Alex Spies
I'm a PhD student in Machine Learning at Imperial College London currently working on mechanistic interpretability and world models in transformer networks. I previously did research at Lawrence Berkeley National Laboratory working with Benjamin Nachman on pixel detectors for high energy physics experiments.
My work focuses on understanding how neural networks, particularly transformers, learn to represent and reason about structured information. I'm especially interested in using simple domains like maze-solving to reverse engineer the computational strategies these models develop. Through techniques like sparse autoencoders and attention analysis, I work to uncover and intervene on the causal world models that emerge during training.
Recently, I've been exploring how object-centric representations and relational reasoning interact with sparsity constraints to enable more interpretable models. This builds on my broader interest in developing methods to make AI systems more transparent and controllable while maintaining their impressive capabilities.
Publications
Transformers Use Causal World Models in Maze-Solving Tasks
Alex F Spies, William Edwards, Michael I. Ivanitskiy, Adrians Skapars, Tilman Rauker, Katsumi Inoue, Alessandra Russo, Murray Shanahan
Structured World Representations in Maze-Solving Transformers
Michael I. Ivanitskiy, Alex F Spies, Tilman Rauker, Guillaume Corlouer, Chris Mathwin, Lucia Quirke, Can Rager, Rusheb Shah, Dan Valentine, Cecilia G. Diniz Behn, Katsumi Inoue, Samy Wu Fung
arXiv.org 2023
A Configurable Library for Generating and Manipulating Maze Datasets
Michael I. Ivanitskiy, Rusheb Shah, Alex F Spies, Tilman Rauker, Dan Valentine, Can Rager, Lucia Quirke, Chris Mathwin, Guillaume Corlouer, Cecilia G. Diniz Behn, Samy Wu Fung Colorado School of Mines, D. O. Mathematics, Statistics Imperial College London
arXiv.org 2023
Sparse Relational Reasoning with Object-Centric Representations
Alex F Spies, Alessandra Russo, M. Shanahan
arXiv.org 2022
Nonlocal thresholds for improving the spatial resolution of pixel detectors
B. Nachman, Alex F Spies
Journal of Instrumentation 2019
Linearly Structured World Representations in Maze-Solving Transformers
Michael I. Ivanitskiy, Alex F Spies, Tilman Räuker, Guillaume Corlouer, Chris Mathwin, Lucia Quirke, Can Rager, Rusheb Shah, Dan Valentine, Cecilia G. Diniz Behn, Katsumi Inoue, Samy Wu Fung
UniReps 2023