© Neuronpedia 2026
Privacy & Terms
Blog
GitHub
Slack
Twitter
Contact
Neuronpedia
Natural Language
Autoencoders
NEW
Assistant Axis
NEW
Circuit Tracer
UPDATE
Releases
Jump To
Search
Models
Steer
SAE Evals
Exports
Guides
API
Community
Blog
Privacy & Terms
Contact
Sign In
Home
Apollo Research · Taylor · Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning
GPT2-Small
Residual Similar CE E2E Recon
6-RES_SCEFR-AJT
3593
Prev
Next
MODEL
6-res_scefr-ajt
Source/SAE
INDEX
Go
Explanations
mentions of legal and political issues along with notable events or actions related to social or public figures
oai_token-act-pair · gpt-3.5-turbo
No Scores
words that indicate a strong personal attachment or possession, often seen by the presence of personal pronouns or possessive adjectives
oai_token-act-pair · gpt-3.5-turbo
Triggered by @danbraun
No Scores
New Auto-Interp
AutoInterp Type
claude-4-5-haiku
Generate
Top Features by Cosine Similarity
Configuration
neuronpedia/gpt2-small__res_scefr-ajt/6-res_scefr-ajt
How To Load
Prompts (Dashboard)
12,288 prompts, 128 tokens each
Dataset (Dashboard)
Skylion007/openwebtext
Features
46,080
Data Type
torch.float32
Hook Point
blocks.6.hook_resid_pre
Architecture
standard
Context Size
128
Dataset
apollo-research/Skylion007-openwebtext-tokenizer-gpt2
Hook Point Layer
6
Activation Function
relu
Show All
Embeds
Show Plots
Show Explanation
Show Activations
Show Test Field
Show Steer
Show Link
IFrame
<iframe src="https://www.neuronpedia.org/gpt2-small/6-res_scefr-ajt/3593?embed=true&embedexplanation=true&embedplots=true&embedsteer=true&embedactivations=true&embedlink=true&embedtest=true" title="Neuronpedia" style="height: 300px; width: 540px;"></iframe>
Link
https://www.neuronpedia.org/gpt2-small/6-res_scefr-ajt/3593?embed=true&embedexplanation=true&embedplots=true&embedsteer=true&embedactivations=true&embedlink=true&embedtest=true
Not in Any Lists
Add to List
▼
No Comments
ADD
Neuron Alignment
Index
Value
% of L₁
447
+1.58
39.0%
138
+0.44
10.8%
266
+0.01
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
138
+1.58
1.00
447
+0.44
0.99
64
+0.01
-0.01
Negative Logits
guiActiveUn
-0.07
é¾
-0.07
ij士
-0.06
elsius
-0.06
oÄŁ
-0.06
Azerb
-0.06
ĪĴ
-0.06
©¶æ
-0.06
ĵĺ
-0.06
¿½
-0.05
POSITIVE LOGITS
the
0.08
,
0.07
↵
0.07
and
0.07
in
0.07
a
0.07
.
0.07
to
0.07
is
0.07
of
0.07
Act
ivations
Density 3.390%
Test
Steer
Stacked
Snippet
Full
Show Raw Tokens
Show Formatted
Show Breaks
Hide Breaks
No Known Activations