INDEX
Explanations
Instructions and explanations
The neuron isn’t looking for any particular words or concepts but rather responds to how far into the current text segment a token appears, with activation rising as you move toward the middle/end of a segment.
New Auto-Interp
Negative Logits
td
-0.07
ером
-0.06
iren
-0.06
ackages
-0.06
.react
-0.06
add
-0.06
ipsoid
-0.06
motion
-0.06
ーマ
-0.05
etmiştir
-0.05
POSITIVE LOGITS
dildo
0.06
عم
0.06
Clients
0.06
domác
0.06
Optional
0.06
Sloan
0.06
_ASSOC
0.06
한번
0.06
¯Â
0.06
ры
0.06
Activations Density 1.831%