INDEX
Explanations
statements reflecting personal feelings or experiences
New Auto-Interp
Negative Logits
darn
-0.15
others
-0.15
413
-0.15
inde
-0.14
uby
-0.14
125
-0.14
727
-0.14
éį
-0.14
427
-0.14
we
-0.14
POSITIVE LOGITS
ascimento
0.17
&m
0.16
dna
0.16
Äĥn
0.16
ãĥ³ãĥĨ
0.15
cete
0.15
bject
0.15
addtogroup
0.15
permission
0.14
iegel
0.14
Activations Density 0.058%