INDEX
Explanations
words related to threatening behavior
phrases related to threats, particularly those of violence or intimidation
New Auto-Interp
Negative Logits
çĦ
-0.84
arist
-0.80
urgy
-0.73
coat
-0.71
puted
-0.71
mys
-0.70
Balanced
-0.70
bits
-0.69
cise
-0.69
éĸ
-0.68
POSITIVE LOGITS
threats
0.85
posed
0.81
warnings
0.80
threatening
0.78
posters
0.75
threat
0.74
intimidation
0.74
hotline
0.74
threatened
0.73
leveled
0.72
Activations Density 0.023%