INDEX
Explanations
words related to communication, information dissemination, and opinions
New Auto-Interp
Negative Logits
addafi
-0.91
zzi
-0.83
Wrestle
-0.82
Breaker
-0.82
Mata
-0.82
atari
-0.80
Able
-0.78
Columb
-0.78
McAuliffe
-0.77
Clemson
-0.76
POSITIVE LOGITS
manship
1.34
emies
1.31
terday
1.23
umer
1.21
kowski
1.21
burgh
1.19
kamp
1.17
ource
1.15
chen
1.13
chant
1.11
Activations Density 0.586%