INDEX
Explanations
multiple languages
The neuron strongly activates on single‐word affirmative replies (e.g. “Да,” “Sim,” etc.), i.e. short tokens meaning “yes.”
New Auto-Interp
Negative Logits
_HOOK
-0.06
Leo
-0.06
workers
-0.06
nestled
-0.06
Book
-0.06
před
-0.06
Herbert
-0.06
497
-0.06
iset
-0.05
UAE
-0.05
POSITIVE LOGITS
_FACTOR
0.07
_PID
0.07
trộn
0.07
toItem
0.06
�
0.06
;.
0.06
ormal
0.06
�
0.06
------------↵
0.06
幕
0.06
Activations Density 0.018%