INDEX
Explanations
words related to negative actions or events, including scandals and violence
negative terms or references
New Auto-Interp
Negative Logits
Dickinson
-0.73
ulhu
-0.71
Norris
-0.61
Rica
-0.58
AVG
-0.55
Gunn
-0.55
Arabian
-0.55
Burr
-0.54
Richards
-0.53
kson
-0.51
POSITIVE LOGITS
sized
1.01
based
1.01
level
0.91
themed
0.88
to
0.86
advertising
0.86
style
0.86
friendly
0.83
centric
0.82
oriented
0.82
Activations Density 0.359%