Build on ICA Lens

Starting point

Fit ICA on new models.

The most direct extension is to run ICA Lens on newer, larger, multilingual, or domain-specific LLMs. This tests whether the same non-Gaussian structure appears across model families and gives researchers an inspectable basis before model-specific SAE dictionaries exist.

Start with the Qwen3.6-27B one-layer tutorial

Research direction

Go beyond the d-component limit.

Standard compact ICA returns at most d components for a d-dimensional activation space. One way around this hard limit is to fit ICA on different datasets or distributions, then compare the resulting bases in the same activation space. Different corpora may expose different non-Gaussian directions, giving a route toward a richer component inventory without relying on a single overcomplete fit. Another route is to test higher-capacity ICA variants such as overcomplete ICA, adaptive or deflationary FastICA, Infomax, JADE, and heavy-tail-aware objectives.

Research direction

Automatic annotation for ICA components.

Once an ICA model is fit, the explorer already exposes top examples, opposite-side examples, signed scores, ERF, trace plots, and prompt tests. A natural next step is to use this evidence to propose component labels automatically, then hand the candidate labels to humans for verification.

Research direction

Theorize ERF beyond a useful heuristic.

The current Effective Receptive Field diagnostic asks how much left context is sufficient to recover a component response. Future work can make this more principled: test robustness, compare recovery criteria, connect ERF to annotation difficulty, and relate component scope to model computation.

Research direction

Scale FastICA to huge open models.

The current pipeline loads activation matrices into memory, which becomes limiting for 27B-scale models and beyond. A scalable ICA Lens would need better memory management: distributed fitting, streaming or blockwise whitening, activation offloading, and algorithms that can move between CPU, disk, and GPU without treating the full dataset as one in-memory matrix.

Research direction

Fit ICA across activation sites, not only layers.

ICA Lens currently focuses on embeddings and residual-stream states. A broader analysis could include attention outputs, MLP outputs, residual updates, shared bases across layers, or multiple sites at once. This would help study how directions emerge, transform, persist, or disappear through the forward pass.

Research direction

Test whether ICA components can become practical steering handles.

SAEBench TPP suggests that zeroing a small number of ICA coordinates can selectively move probe-relevant behavior. A practical steering workflow would need to edit signed scores, reconstruct through the writing map, the pseudoinverse of the reading map R, patch back into the residual stream, and check whether pseudoinverse quality and conditioning are good enough for stable edits outside benchmark probes and against strong task-specific baselines.

Research direction

Develop ICA-SAE hybrid methods.

ICA and SAEs expose related but non-redundant directions in activation space. A hybrid method could combine ICA's compact non-Gaussian basis with SAE-style overcomplete sparse reconstruction, using the strengths of both to improve component discovery, labeling, and intervention.

Research direction

Apply ICA Lens to vision-language models.

ICA can also be fit on image-text datasets rather than text-only corpora. For VLMs, including image-token activations in the decomposition could help reveal what visual token positions represent in deeper layers and how visual evidence mixes with text through the model.

Research direction

Study activation geometry after normalization.

Row normalization makes ICA fitting more stable and supports the idea that directions carry important structure even when raw activation norms are set aside. This opens a deeper question: how much of LLM representation geometry becomes simpler on the normalized sphere, and what does that reveal about features, contexts, and interventions?

More ideas

This list is meant to grow.

Other directions include auditing SAE labels through ICA-SAE overlap, overcomplete ICA variants, task-specific decompositions, and better interfaces for comparing components across models. The release is intended as infrastructure for these follow-up projects, not only as a static paper artifact.

Research directions opened by ICA lens.

Fit ICA on new models.

Go beyond the d-component limit.

Automatic annotation for ICA components.

Theorize ERF beyond a useful heuristic.

Scale FastICA to huge open models.

Fit ICA across activation sites, not only layers.

Test whether ICA components can become practical steering handles.

Develop ICA-SAE hybrid methods.

Apply ICA Lens to vision-language models.

Study activation geometry after normalization.

This list is meant to grow.