INDEX
Explanations
This neuron activates on occurrences of the token “train” (i.e. it detects mentions of a train).
New Auto-Interp
Negative Logits
Kelly
-0.07
elerle
-0.07
caps
-0.07
cort
-0.06
Labor
-0.06
ipple
-0.06
Lily
-0.06
aber
-0.06
Caps
-0.06
oxy
-0.06
POSITIVE LOGITS
train
0.16
Train
0.12
trains
0.11
train
0.10
cậu
0.08
(train
0.07
.train
0.07
Train
0.07
Steak
0.07
français
0.07
Activations Density 0.008%