INDEX
Explanations
references to incidents of hate crimes and community support in the context of violence
New Auto-Interp
Negative Logits
gua
-0.18
ceptive
-0.15
ıb
-0.15
ê°ģ
-0.15
jam
-0.14
opian
-0.14
_HC
-0.14
COLUMN
-0.14
lya
-0.14
RVA
-0.13
POSITIVE LOGITS
prelim
0.24
preliminary
0.24
Prel
0.20
initial
0.19
motive
0.18
investigation
0.17
ruled
0.17
Initial
0.17
Initial
0.16
[](
0.16
Activations Density 0.059%