INDEX
Explanations
mentions of hate crimes and hate speech
mentions of hate crimes and related terminology
New Auto-Interp
Negative Logits
UNCH
-0.76
aver
-0.71
amina
-0.70
enture
-0.70
clinton
-0.69
reluct
-0.69
ITNESS
-0.68
Prospect
-0.68
BuyableInstoreAndOnline
-0.68
atel
-0.67
POSITIVE LOGITS
fulness
1.13
crimes
1.13
speech
1.05
fully
1.02
ful
0.97
crime
0.95
speech
0.91
Crimes
0.88
Speech
0.88
mobs
0.88
Activations Density 0.044%