INDEX
Explanations
languages
The neuron fires on occurrences of language names (e.g. “Portuguese,” “Spanish,” “French,” “Vietnamese,” “Russian”).
New Auto-Interp
Negative Logits
Raf
-0.07
latin
-0.07
aza
-0.06
seudo
-0.06
fw
-0.06
Platz
-0.06
flats
-0.06
rats
-0.06
ason
-0.06
цит
-0.06
POSITIVE LOGITS
conveying
0.07
заяви
0.06
�
0.06
엄마
0.06
SAN
0.06
sausage
0.06
країн
0.06
couleur
0.06
ICIENT
0.06
.me
0.06
Activations Density 0.031%