INDEX
Explanations
The neuron consistently activates on the country name “Brazil” (and its adjectival/demonym forms like “Brazilian,” “Brasil’s,” or related place names), so it’s identifying mentions of Brazil.
New Auto-Interp
Negative Logits
Amb
-0.07
ีข
-0.07
Iron
-0.06
Auch
-0.06
onacci
-0.06
porno
-0.06
DeepCopy
-0.06
untu
-0.06
longitude
-0.06
Telefono
-0.06
POSITIVE LOGITS
Brazil
0.11
Brazil
0.10
Brazilian
0.10
brasile
0.08
Brasil
0.08
Janeiro
0.07
:url
0.07
갖
0.07
brazil
0.07
Bras
0.07
Activations Density 0.018%