INDEX
Explanations
confusion
The neuron detects words expressing confusion or perplexity (e.g. “baffled,” “perplexed”).
New Auto-Interp
Negative Logits
好的
-0.07
dispatcher
-0.07
:item
-0.06
電影
-0.06
.bunifu
-0.06
|array
-0.06
ulares
-0.06
inv
-0.06
Σεπ
-0.06
.correct
-0.06
POSITIVE LOGITS
puzzled
0.10
perplex
0.09
baff
0.08
(__
0.07
MET
0.07
vex
0.07
HT
0.07
scrambling
0.06
affle
0.06
bem
0.06
Activations Density 0.010%