INDEX
Explanations
words related to opinions or beliefs
New Auto-Interp
Negative Logits
"},"
-0.75
etter
-0.63
artney
-0.62
natureconservancy
-0.62
ropolis
-0.62
rosis
-0.61
DN
-0.61
debian
-0.60
dL
-0.60
uits
-0.59
POSITIVE LOGITS
yourselves
0.72
yours
0.71
beit
0.69
ably
0.69
kidding
0.66
me
0.66
guessing
0.63
ingly
0.62
fax
0.62
thy
0.62
Activations Density 0.114%