INDEX
Explanations
The neuron never activates on any token, so it does not detect any specific pattern in the text.
New Auto-Interp
Negative Logits
�
-0.07
libertine
-0.07
Purpose
-0.07
.ma
-0.06
odied
-0.06
Male
-0.06
客户
-0.06
Hack
-0.06
collectively
-0.06
FINE
-0.06
POSITIVE LOGITS
(tv
0.07
$?
0.07
ibName
0.06
.Localization
0.06
เคร
0.06
cantidad
0.06
deviceId
0.06
overshadow
0.06
:System
0.06
lights
0.06
Activations Density 0.027%