INDEX
Explanations
Difficult situations
This neuron activates on tokens expressing first-person perspective, particularly “I” and related self-referential words in personal statements.
New Auto-Interp
Negative Logits
дут
-0.06
Boeing
-0.06
鼠
-0.06
lowes
-0.06
皮
-0.06
cout
-0.06
Samantha
-0.06
_LS
-0.06
丁目
-0.06
mesa
-0.06
POSITIVE LOGITS
fers
0.07
(""));↵0.07
.compiler
0.06
Handbook
0.06
аллерг
0.06
musel
0.06
uomini
0.06
lovak
0.06
Fam
0.06
ileceği
0.06
Activations Density 0.078%