INDEX
Explanations
violence-related content
references to violence and serious social issues
New Auto-Interp
Negative Logits
supplemental
-0.71
Specifically
-0.70
dude
-0.67
workout
-0.66
aska
-0.65
ASAP
-0.64
tweaked
-0.64
gee
-0.63
Specifically
-0.63
maximizing
-0.62
POSITIVE LOGITS
sectarian
0.85
imperialist
0.82
tragedies
0.80
rumours
0.80
testimonies
0.80
politicians
0.80
geop
0.80
neighbouring
0.80
Pakistan
0.79
martyr
0.78
Activations Density 1.330%