INDEX
Explanations
instances of "I" to identify self-referential expressions
New Auto-Interp
Negative Logits
olik
-0.17
.Formatting
-0.17
zyst
-0.16
rž
-0.14
аниÑĨ
-0.14
é«ĺæ¸ħ
-0.14
ίνη
-0.14
åŁ
-0.14
iteral
-0.14
ych
-0.13
POSITIVE LOGITS
nn
0.15
rip
0.15
loved
0.15
ı
0.14
entes
0.14
_dl
0.14
entin
0.14
Daly
0.14
Entity
0.14
Fauc
0.14
Activations Density 0.252%