INDEX
Explanations
This neuron activates on mentions of “user” (including tokens like user, username, and the user role in headers).
New Auto-Interp
Negative Logits
Zimmerman
-0.07
университ
-0.07
[code
-0.06
goals
-0.06
governments
-0.06
Phillip
-0.06
Wer
-0.06
fiction
-0.06
타이
-0.06
лишком
-0.06
POSITIVE LOGITS
.QRect
0.07
Use
0.07
acerb
0.07
bitwise
0.06
_decay
0.06
adí
0.06
scé
0.06
視
0.06
backdrop
0.06
sem
0.06
Activations Density 0.034%