INDEX
Explanations
references to firearms and gun-related terminology
New Auto-Interp
Negative Logits
еÑĢин
-0.18
hra
-0.17
HORT
-0.16
©
-0.15
cue
-0.15
تÙĨ
-0.15
bombing
-0.15
merc
-0.15
cloth
-0.15
acus
-0.14
POSITIVE LOGITS
pow
0.30
ned
0.22
ning
0.22
shots
0.20
metal
0.20
ny
0.20
linger
0.19
erals
0.18
ners
0.18
smith
0.18
Activations Density 0.023%