INDEX
Explanations
statements beginning with the pronoun "I"
New Auto-Interp
Negative Logits
ir
-0.17
act
-0.17
le
-0.16
asca
-0.16
ingen
-0.15
erna
-0.15
forth
-0.15
ItemCount
-0.15
eb
-0.14
ya
-0.14
POSITIVE LOGITS
vrier
0.14
agh
0.14
upo
0.13
ÑĨеÑģ
0.13
alÄ±ÅŁ
0.13
Lor
0.13
mac
0.13
emek
0.13
dib
0.13
pu
0.13
Activations Density 0.117%