INDEX
Explanations
This neuron never activates on any text—it doesn’t detect or respond to any token pattern.
New Auto-Interp
Negative Logits
Info
-0.07
ZO
-0.07
či
-0.07
diets
-0.06
adio
-0.06
payments
-0.06
IO
-0.06
řekla
-0.06
Io
-0.06
누
-0.06
POSITIVE LOGITS
warrant
0.10
warrants
0.09
warranted
0.08
Sent
0.08
WARRANT
0.08
Dexter
0.07
wre
0.07
UT
0.07
Worldwide
0.07
�
0.07
Activations Density 0.002%