INDEX
Explanations
references to violent incidents and their societal implications
New Auto-Interp
Negative Logits
idar
-0.14
eroon
-0.14
stanov
-0.14
ç¸
-0.13
uels
-0.13
ierge
-0.13
INSTANCE
-0.13
emark
-0.13
DISCLAIMER
-0.13
çŁ¢
-0.13
POSITIVE LOGITS
headlines
0.51
news
0.46
media
0.43
headline
0.41
coverage
0.40
æĸ°éĹ»
0.36
news
0.34
media
0.33
coverage
0.33
-media
0.32
Activations Density 0.336%