INDEX
Explanations
phrases related to controversial or impactful statements made by individuals
references to public statements or comments
New Auto-Interp
Negative Logits
ccording
-0.78
bid
-0.77
PG
-0.77
tis
-0.77
wick
-0.67
Mech
-0.64
Craw
-0.63
tails
-0.62
erection
-0.62
tail
-0.62
POSITIVE LOGITS
uttered
1.15
regarding
1.11
about
1.07
dispar
1.04
aloud
1.00
implying
0.99
remarks
0.99
concerning
0.97
criticizing
0.94
praising
0.94
Activations Density 0.087%