INDEX
Explanations
words related to political events or social issues
New Auto-Interp
Negative Logits
"$:/
-0.70
affiliation
-0.63
connection
-0.62
radiant
-0.61
OTO
-0.60
overturn
-0.60
FANTASY
-0.59
Â
-0.59
neutral
-0.58
appe
-0.58
POSITIVE LOGITS
ums
4.81
um
2.21
UM
1.90
ummies
1.36
ummy
1.31
umn
1.31
umps
1.22
ummer
1.22
umb
1.15
umm
1.15
Activations Density 0.008%