INDEX
Explanations
links and references
This neuron fires on in-text references to a book (e.g. “referred to a book by …”), detecting book‐citation phrases.
New Auto-Interp
Negative Logits
astore
-0.06
competition
-0.06
ύ
-0.06
lent
-0.06
_FD
-0.06
δε
-0.06
Tray
-0.06
栏
-0.06
lexible
-0.06
withheld
-0.06
POSITIVE LOGITS
attenu
0.07
NE
0.07
SY
0.07
"urls
0.07
almost
0.07
realidad
0.06
THEY
0.06
"These
0.06
рг
0.06
گ
0.06
Activations Density 0.133%