INDEX
Explanations
mentions of violent actions and executions
New Auto-Interp
Negative Logits
diali
-0.47
laude
-0.46
Capricorn
-0.44
hosting
-0.42
ITED
-0.42
않
-0.41
WithIOException
-0.41
متعلقه
-0.41
ild
-0.40
wość
-0.40
POSITIVE LOGITS
death
1.21
deaths
1.08
death
1.05
DEATH
0.96
posthum
0.92
Death
0.91
mortality
0.91
killing
0.91
Deaths
0.90
suicide
0.89
Activations Density 0.600%