INDEX
Explanations
mentions of firearms or weapons, particularly focusing on guns
references to guns
New Auto-Interp
Negative Logits
grad
-0.72
Fair
-0.69
Attempts
-0.66
lihood
-0.65
spect
-0.64
UTE
-0.64
Work
-0.64
Benef
-0.63
Solution
-0.62
eff
-0.61
POSITIVE LOGITS
linger
1.34
blazing
1.21
hips
1.15
guns
1.12
mith
1.09
powder
1.06
poons
0.97
hops
0.96
hooting
0.95
hip
0.93
Activations Density 0.016%