INDEX
Explanations
This neuron fires on the first content word at the start of a free-form paragraph, i.e. paragraph or section openings.
New Auto-Interp
Negative Logits
Separ
-0.07
-host
-0.07
erve
-0.06
_weak
-0.06
imu
-0.06
idity
-0.06
erving
-0.06
rew
-0.06
rition
-0.06
wcs
-0.06
POSITIVE LOGITS
뒤
0.07
Tot
0.07
Unsure
0.07
ै.↵
0.06
Lyme
0.06
spielen
0.06
中的
0.06
Τα
0.06
крок
0.06
путем
0.06
Activations Density 0.163%