INDEX
Explanations
phrases or words related to threatening actions or statements made by individuals
instances of threats or intimidation
New Auto-Interp
Negative Logits
gart
-0.89
æ©Ł
-0.77
rite
-0.75
mys
-0.71
_>
-0.70
served
-0.70
cedented
-0.69
prototype
-0.69
phant
-0.68
dx
-0.67
POSITIVE LOGITS
retaliation
0.99
repr
0.91
to
0.90
eviction
0.90
retribution
0.88
violence
0.87
expulsion
0.85
termination
0.84
suicide
0.82
annihilation
0.81
Activations Density 0.039%