INDEX
Explanations
negative sentiment/arguments
This neuron detects passive‐voice “is used by” constructions emphasizing legitimate or dual usage (e.g. “is also used by citizens”).
New Auto-Interp
Negative Logits
-states
-0.06
harmful
-0.06
規
-0.06
inds
-0.06
Nine
-0.06
riches
-0.06
.assertAlmostEqual
-0.06
303
-0.06
ircular
-0.06
Kath
-0.06
POSITIVE LOGITS
-part
0.06
анк
0.06
crunchy
0.06
vile
0.06
compounded
0.06
iflower
0.06
ね
0.06
butcher
0.06
Spacer
0.06
tucked
0.06
Activations Density 0.155%