INDEX
Explanations
The neuron activates on words that express selecting or configuring options (e.g., “choose,” “pick,” “design,” “want”).
New Auto-Interp
Negative Logits
_smooth
-0.07
erer
-0.06
FromArray
-0.06
_Position
-0.06
|_|
-0.06
Collapse
-0.06
amics
-0.06
_multiplier
-0.06
spins
-0.06
thành
-0.06
POSITIVE LOGITS
();)
0.08
">--}}↵
0.07
záb
0.06
щается
0.06
ENCE
0.06
EVENT
0.06
выб
0.06
Trans
0.06
істор
0.06
Arr
0.06
Activations Density 0.049%