INDEX
Explanations
weapon-related terms like 'gun', 'rifle', and 'knife'
nouns related to weapons and objects associated with violence or crime
New Auto-Interp
Negative Logits
Helpful
-0.85
______
-0.65
Flavoring
-0.65
erenn
-0.63
4090
-0.63
ilaterally
-0.62
LLP
-0.60
sqor
-0.60
EStreamFrame
-0.60
___
-0.59
POSITIVE LOGITS
disappears
1.06
wasn
1.05
belonged
1.00
belongs
1.00
hasn
0.95
isn
0.95
was
0.94
doesn
0.90
appears
0.89
explodes
0.88
Activations Density 0.527%