INDEX
Explanations
references to firearms and the term "gun"
New Auto-Interp
Negative Logits
oha
-0.17
еÑĢин
-0.16
ÑĢажд
-0.15
alling
-0.15
urnal
-0.15
©
-0.15
cue
-0.14
usz
-0.14
ustin
-0.14
ally
-0.14
POSITIVE LOGITS
pow
0.28
ny
0.21
ning
0.21
ned
0.20
linger
0.20
shots
0.19
erals
0.19
metal
0.18
nen
0.18
ners
0.17
Activations Density 0.024%