INDEX
Explanations
The neuron responds to uppercase acronym‐style tokens (e.g. “U”, “GH”) in the text.
New Auto-Interp
Negative Logits
licity
-0.08
alom
-0.07
uria
-0.07
оны
-0.07
них
-0.07
框
-0.07
хо
-0.07
woods
-0.07
&M
-0.07
Century
-0.07
POSITIVE LOGITS
(?,
0.06
"),↵
0.06
%↵↵
0.06
']; ↵
0.06
mağ
0.06
%↵
0.06
@{↵0.06
(£
0.06
”。↵↵
0.06
↵
0.06
Activations Density 0.346%