INDEX
Explanations
The neuron is primarily activated by occurrences of the subword “bit” (as in “a bit of…”).
New Auto-Interp
Negative Logits
ckt
-0.07
ně
-0.07
https
-0.06
_threads
-0.06
緊
-0.06
alled
-0.06
ouncement
-0.06
uslim
-0.06
Organizations
-0.06
oled
-0.06
POSITIVE LOGITS
conco
0.07
правило
0.06
ditch
0.06
쪽지
0.06
aesthetics
0.06
DISP
0.06
胶
0.06
practitioner
0.06
of
0.06
주의
0.06
Activations Density 0.009%