INDEX
Explanations
The neuron is sensitive to the token “words,” i.e. it flags when the prompt is specifying a word-count constraint.
New Auto-Interp
Negative Logits
nargs
-0.06
AnimationFrame
-0.06
updater
-0.06
_layers
-0.06
вход
-0.06
zpráva
-0.06
.fixed
-0.06
ERR
-0.06
UPLOAD
-0.06
Quang
-0.06
POSITIVE LOGITS
elo
0.07
Curse
0.06
任
0.06
_Session
0.06
fond
0.06
uli
0.06
cruel
0.06
_Current
0.06
Pods
0.06
billig
0.06
Activations Density 0.001%