INDEX
Explanations
The neuron selectively responds to Romance‐language terms built on the root “aprend-,” i.e. words referring to the concept of learning.
New Auto-Interp
Negative Logits
Fi
-0.07
uali
-0.07
�
-0.07
олі
-0.06
Goal
-0.06
cul
-0.06
rl
-0.06
oul
-0.06
Circle
-0.06
贸
-0.06
POSITIVE LOGITS
aprend
0.08
AND
0.07
upp
0.07
-only
0.07
από
0.07
امر
0.07
_apps
0.07
Apprent
0.07
add
0.07
από
0.07
Activations Density 0.019%