INDEX
Explanations
The neuron primarily detects standalone colon tokens (":"), marking section labels or headings in the text.
New Auto-Interp
Negative Logits
hızla
-0.07
uns
-0.07
ally
-0.07
جه
-0.07
ále
-0.07
_keyword
-0.06
část
-0.06
meyi
-0.06
ساله
-0.06
В
-0.06
POSITIVE LOGITS
奧
0.07
dryer
0.07
zdję
0.06
Bair
0.06
Sem
0.06
toilets
0.06
П
0.06
ея
0.06
duğ
0.06
unlocks
0.06
Activations Density 0.070%