INDEX
Explanations
mentions and discussions of violence, particularly in the context of its impact on various societal issues
New Auto-Interp
Negative Logits
иÑĪ
-0.16
size
-0.15
lify
-0.15
ublish
-0.15
opa
-0.15
akin
-0.15
ikat
-0.15
spy
-0.15
rid
-0.14
istical
-0.14
POSITIVE LOGITS
directed
0.24
towards
0.23
against
0.23
toward
0.23
Against
0.22
committed
0.20
Tow
0.20
Towards
0.18
/ag
0.18
against
0.17
Activations Density 0.024%