INDEX
Explanations
references to political and social issues
New Auto-Interp
Negative Logits
aws
-0.77
ARS
-0.76
ickets
-0.73
unk
-0.72
isms
-0.71
gans
-0.70
icons
-0.69
adle
-0.69
ivas
-0.69
acers
-0.69
POSITIVE LOGITS
week
1.13
year
1.00
latest
1.00
trope
0.99
particular
0.98
article
0.98
month
0.95
weekend
0.92
incarnation
0.90
arrangement
0.88
Activations Density 0.902%