INDEX
Explanations
The neuron activates on tokens naming vehicle body‐location parts—especially “rear” (and occasionally “front”) references.
New Auto-Interp
Negative Logits
Leaks
-0.07
World
-0.07
NUM
-0.06
الحديث
-0.06
repeatedly
-0.06
,the
-0.06
ился
-0.06
وت
-0.06
X
-0.06
Idol
-0.06
POSITIVE LOGITS
front
0.07
prat
0.07
PosY
0.06
Scientology
0.06
vicinity
0.06
.Region
0.06
classname
0.06
Auth
0.06
icago
0.06
gü
0.06
Activations Density 0.005%