INDEX
Explanations
This neuron detects the definite article “the.”
New Auto-Interp
Negative Logits
koruy
-0.07
苦
-0.07
(rot
-0.07
_B
-0.07
L
-0.07
limiting
-0.07
۸
-0.06
increasingly
-0.06
.app
-0.06
۹
-0.06
POSITIVE LOGITS
ObjectName
0.06
standoff
0.06
_ioctl
0.06
.httpClient
0.06
Sampler
0.06
defenses
0.06
spouses
0.06
šní
0.06
-the
0.06
ımın
0.06
Activations Density 0.037%