INDEX
Explanations
claims regarding social and political issues and their perceived impacts
New Auto-Interp
Negative Logits
mate
-0.71
Torrent
-0.67
gren
-0.67
Dynamo
-0.65
brother
-0.64
pleted
-0.63
hess
-0.63
IFE
-0.63
PLUS
-0.63
dies
-0.63
POSITIVE LOGITS
simplistic
1.20
exagger
1.11
dismiss
1.08
caricature
1.08
blaming
1.07
dispar
1.06
dismissing
1.04
notions
1.03
overly
1.03
explanations
1.02
Activations Density 0.310%