Alex Spies

Profile
I'm a PhD student in Machine Learning at Imperial College London currently working on mechanistic interpretability and world models in transformer networks. I previously did research at Lawrence Berkeley National Laboratory working with Benjamin Nachman on pixel detectors for high energy physics experiments. My work focuses on understanding how neural networks, particularly transformers, learn to represent and reason about structured information. I'm especially interested in using simple domains like maze-solving to reverse engineer the computational strategies these models develop. Through techniques like sparse autoencoders and attention analysis, I work to uncover and intervene on the causal world models that emerge during training.

Recently, I've been exploring how object-centric representations and relational reasoning interact with sparsity constraints to enable more interpretable models. This builds on my broader interest in developing methods to make AI systems more transparent and controllable while maintaining their impressive capabilities.

Publications

Transformers Use Causal World Models in Maze-Solving Tasks

Transformers Use Causal World Models in Maze-Solving Tasks

Alex F Spies, William Edwards, Michael I. Ivanitskiy, Adrians Skapars, Tilman Rauker, Katsumi Inoue, Alessandra Russo, Murray Shanahan

Structured World Representations in Maze-Solving Transformers

Structured World Representations in Maze-Solving Transformers

Michael I. Ivanitskiy, Alex F Spies, Tilman Rauker, Guillaume Corlouer, Chris Mathwin, Lucia Quirke, Can Rager, Rusheb Shah, Dan Valentine, Cecilia G. Diniz Behn, Katsumi Inoue, Samy Wu Fung

arXiv.org 2023

A Configurable Library for Generating and Manipulating Maze Datasets

A Configurable Library for Generating and Manipulating Maze Datasets

Michael I. Ivanitskiy, Rusheb Shah, Alex F Spies, Tilman Rauker, Dan Valentine, Can Rager, Lucia Quirke, Chris Mathwin, Guillaume Corlouer, Cecilia G. Diniz Behn, Samy Wu Fung Colorado School of Mines, D. O. Mathematics, Statistics Imperial College London

arXiv.org 2023

Sparse Relational Reasoning with Object-Centric Representations

Sparse Relational Reasoning with Object-Centric Representations

Alex F Spies, Alessandra Russo, M. Shanahan

arXiv.org 2022

Nonlocal thresholds for improving the spatial resolution of pixel detectors

B. Nachman, Alex F Spies

Journal of Instrumentation 2019

Linearly Structured World Representations in Maze-Solving Transformers

Michael I. Ivanitskiy, Alex F Spies, Tilman Räuker, Guillaume Corlouer, Chris Mathwin, Lucia Quirke, Can Rager, Rusheb Shah, Dan Valentine, Cecilia G. Diniz Behn, Katsumi Inoue, Samy Wu Fung

UniReps 2023