INDEX
Explanations
concepts related to threats and intimidation
references to the concept of "threat."
New Auto-Interp
Negative Logits
mys
-0.91
ilts
-0.82
algia
-0.81
val
-0.81
iked
-0.77
urgy
-0.76
cle
-0.74
ann
-0.71
rain
-0.71
gart
-0.70
POSITIVE LOGITS
threaten
1.13
threatens
1.11
threats
0.98
endanger
0.94
threatened
0.92
threatening
0.91
challeng
0.91
Threat
0.91
threat
0.90
menacing
0.85
Activations Density 0.006%