INDEX
Explanations
This neuron detects the Spanish preposition “en.”
New Auto-Interp
Negative Logits
dahil
-0.08
caste
-0.07
(PR
-0.07
.ON
-0.07
pleado
-0.06
yoğun
-0.06
yalty
-0.06
striker
-0.06
crud
-0.06
ترجم
-0.06
POSITIVE LOGITS
conc
0.06
Mounted
0.06
teams
0.06
rules
0.06
intoxicated
0.06
site
0.06
-down
0.06
MSD
0.06
_non
0.06
financially
0.06
Activations Density 0.315%