INDEX
Explanations
references to different kinds of threats, particularly death threats
instances of the word "threats" related to various forms of intimidation or danger
New Auto-Interp
Negative Logits
çĦ
-0.83
arist
-0.81
UX
-0.77
cise
-0.76
bred
-0.73
puted
-0.73
æ©Ł
-0.70
tiny
-0.70
NAS
-0.70
CSS
-0.69
POSITIVE LOGITS
threats
0.87
threatening
0.83
posed
0.79
against
0.78
retaliation
0.78
repr
0.77
intimidation
0.75
threatened
0.74
leveled
0.72
hotline
0.72
Activations Density 0.030%