INDEX
Explanations
This neuron responds to words indicating rearward or backward direction or positioning.
New Auto-Interp
Negative Logits
/index
-0.06
deviation
-0.06
.cy
-0.06
Rock
-0.06
Ty
-0.06
Likes
-0.06
ありがとうござ
-0.06
surf
-0.06
Thing
-0.06
Story
-0.06
POSITIVE LOGITS
rear
0.10
Rear
0.09
_td
0.07
ilinx
0.07
fot
0.07
geri
0.07
rim
0.07
elegant
0.07
safeg
0.07
엄
0.06
Activations Density 0.014%