INDEX
Explanations
This neuron doesn’t respond to any tokens—it never activates for any input.
New Auto-Interp
Negative Logits
coffin
-0.06
_models
-0.06
="\
-0.06
modifying
-0.06
сфері
-0.06
Terminator
-0.06
Merlin
-0.06
yum
-0.06
jerk
-0.06
розк
-0.06
POSITIVE LOGITS
ingt
0.07
mentally
0.07
Appeal
0.07
_CITY
0.07
зобов
0.07
prior
0.06
свидетель
0.06
_was
0.06
onga
0.06
temp
0.06
Activations Density 0.004%