INDEX
Explanations
The neuron selectively activates on the Spanish feminine definite article “la.”
New Auto-Interp
Negative Logits
_iteration
-0.07
Donald
-0.07
ith
-0.07
같
-0.07
zase
-0.07
Pull
-0.06
failure
-0.06
decode
-0.06
_dst
-0.06
Either
-0.06
POSITIVE LOGITS
viewType
0.08
CREEN
0.07
افية
0.06
유저
0.06
./
0.06
/page
0.06
стров
0.06
скую
0.06
سين
0.06
شة
0.06
Activations Density 0.125%