INDEX
Explanations
instances of criticism, particularly related to political figures and their actions
New Auto-Interp
Negative Logits
oided
-0.92
piercing
-0.68
cheek
-0.67
inyl
-0.65
circum
-0.65
shaving
-0.64
ballistic
-0.64
shave
-0.64
phthal
-0.63
mouth
-0.63
POSITIVE LOGITS
immigrants
0.83
Hait
0.80
Sandra
0.80
onda
0.77
Imran
0.76
migrants
0.74
Mexicans
0.73
Hispanics
0.71
GOODMAN
0.71
rapists
0.70
Activations Density 0.328%