INDEX
Explanations
expressions of opinion or commentary
New Auto-Interp
Negative Logits
nam
-0.17
Bias
-0.15
ç±į
-0.14
erte
-0.13
439
-0.13
519
-0.13
654
-0.13
hardly
-0.13
ilha
-0.13
agas
-0.13
POSITIVE LOGITS
icipants
0.16
loquent
0.15
fuck
0.15
erosis
0.15
olarity
0.14
ispens
0.14
enment
0.14
Bun
0.14
bdsm
0.13
ValueCollection
0.13
Activations Density 0.000%