INDEX
Explanations
Internet conversations
This neuron detects occurrences of first-person self-references, especially the pronoun “I.”
New Auto-Interp
Negative Logits
escription
-0.07
رده
-0.07
-log
-0.06
-valid
-0.06
printw
-0.06
เจ
-0.06
Carrie
-0.06
Dell
-0.06
pudding
-0.06
uds
-0.06
POSITIVE LOGITS
directs
0.06
eternity
0.06
инт
0.06
()-
0.06
oop
0.06
refund
0.06
トリ
0.06
.onNext
0.06
Rid
0.06
kees
0.06
Activations Density 0.036%