INDEX
Explanations
mentions of violent attacks and their consequences
New Auto-Interp
Negative Logits
lean
-0.17
onium
-0.15
agini
-0.15
zim
-0.14
agine
-0.14
umi
-0.14
Īĺ
-0.14
rana
-0.14
mins
-0.14
compliment
-0.14
POSITIVE LOGITS
odyn
0.15
iya
0.15
Yard
0.15
etÃŃ
0.14
eks
0.14
aleb
0.14
Kick
0.13
khúc
0.13
дел
0.13
latest
0.13
Activations Density 0.047%