INDEX
Explanations
This neuron fires when the text mentions the place name “London.”
New Auto-Interp
Negative Logits
ACE
-0.09
Ashe
-0.07
erase
-0.07
grille
-0.07
anime
-0.07
Aff
-0.07
erg
-0.07
Paige
-0.07
aff
-0.07
appe
-0.07
POSITIVE LOGITS
London
0.15
London
0.13
Lond
0.10
london
0.08
ONDON
0.07
don
0.07
lender
0.07
ouncil
0.07
.news
0.07
�
0.07
Activations Density 0.006%