INDEX
Explanations
negative portrayals and characterizations of specific political groups or individuals
New Auto-Interp
Negative Logits
ãĤ¡
-0.70
ãĤ©
-0.69
uyomi
-0.64
vironment
-0.60
UTF
-0.60
QUI
-0.59
STEM
-0.57
osterone
-0.57
CAST
-0.55
Alloy
-0.55
POSITIVE LOGITS
akening
0.66
imester
0.61
tsy
0.61
vu
0.60
Clyde
0.59
necks
0.59
atari
0.59
reviewed
0.58
Nights
0.56
limb
0.56
Activations Density 0.044%