INDEX
Explanations
phrases indicating claims or statements of existence and being
New Auto-Interp
Negative Logits
loy
-0.14
Blonde
-0.14
tend
-0.14
kil
-0.13
eral
-0.13
ccount
-0.13
lein
-0.13
Deniz
-0.13
334
-0.13
RT
-0.13
POSITIVE LOGITS
be
0.22
have
0.17
contrary
0.16
.have
0.15
iani
0.15
ваÑĢ
0.15
oria
0.15
avern
0.15
iembre
0.15
rades
0.14
Activations Density 0.059%