INDEX
Explanations
This neuron fires on discourse markers that introduce or enumerate new types or categories (e.g. “Another,” “type,” “various,” “In”).
New Auto-Interp
Negative Logits
(redis
-0.07
_already
-0.07
тепер
-0.06
yerine
-0.06
oor
-0.06
irler
-0.06
queue
-0.06
Pad
-0.06
pq
-0.06
ilere
-0.06
POSITIVE LOGITS
H
0.07
>--}}↵
0.06
ED
0.06
-tier
0.06
uggest
0.06
_Class
0.06
Sandy
0.06
JT
0.06
Coffee
0.06
'"';↵
0.06
Activations Density 0.039%