INDEX
Explanations
This neuron fires on the French definite article “La.”
New Auto-Interp
Negative Logits
explain
-0.08
Of
-0.08
thù
-0.07
tower
-0.07
_RF
-0.07
of
-0.06
itize
-0.06
ισμ
-0.06
tavsiye
-0.06
devs
-0.06
POSITIVE LOGITS
La
0.10
El
0.10
Les
0.09
La
0.08
FormGroup
0.08
El
0.08
Les
0.08
Las
0.07
Le
0.07
Get
0.07
Activations Density 0.040%