INDEX
Explanations
instances of threats or threatening language
New Auto-Interp
Negative Logits
SequentialGroup
-0.59
createSprite
-0.52
ColumnHeaders
-0.51
createSlice
-0.49
ngths
-0.48
__":
-0.45
ArrowToggle
-0.42
protoimpl
-0.42
ivelany
-0.42
canestro
-0.42
POSITIVE LOGITS
threatening
0.93
threaten
0.85
threatening
0.82
threatens
0.71
threatened
0.71
Threat
0.69
amenaz
0.65
amea
0.65
amenaza
0.65
threat
0.64
Activations Density 0.128%