Skip to content

Sparse Autoencoder¤

PyPI PyPI - License Checks Release

A sparse autoencoder for mechanistic interpretability research.

pip install sparse_autoencoder

Quick Start¤

Check out the demo notebook for a guide to using this library.

We also highly recommend skimming the reference docs to see all the features that are available.

Features¤

This library contains:

  1. A sparse autoencoder model, along with all the underlying PyTorch components you need to customise and/or build your own:
  2. Activations data generator using TransformerLens, with the underlying steps in case you want to customise the approach:
  3. Activation resampler to help reduce the number of dead neurons.
  4. Metrics that log at various stages of training (loss, train metrics and validation metrics) , based on torchmetrics.
  5. Training pipeline that combines everything together, allowing you to run hyperparameter sweeps and view progress on wandb.

Designed for Research¤

The library is designed to be modular. By default it takes the approach from Towards Monosemanticity: Decomposing Language Models With Dictionary Learning , so you can pip install the library and get started quickly. Then when you need to customise something, you can just extend the abstract class for that component (every component is documented so that it's easy to do this).