INDEX
Explanations
The neuron is primarily activated by the token “what.”
New Auto-Interp
Negative Logits
-files
-0.07
_fig
-0.07
apus
-0.07
-shift
-0.06
floppy
-0.06
thew
-0.06
Chelsea
-0.06
-white
-0.06
-wrap
-0.06
bölg
-0.06
POSITIVE LOGITS
_GROUPS
0.07
्
0.07
getAll
0.06
LAS
0.06
Definitely
0.06
.Millisecond
0.06
赖
0.06
(vector
0.06
Controls
0.06
Intercept
0.06
Activations Density 0.011%