INDEX
Explanations
The neuron activates on mentions of writing or speaking English fluently (e.g. “English fluently”).
New Auto-Interp
Negative Logits
stin
-0.07
stra
-0.07
Memories
-0.06
ا�
-0.06
ا
-0.06
argo
-0.06
.routing
-0.06
Sara
-0.06
Et
-0.06
diffé
-0.06
POSITIVE LOGITS
control
0.07
.Generated
0.06
BBBB
0.06
($__
0.06
zůst
0.06
(QStringLiteral
0.06
國
0.06
Zh
0.06
ODE
0.06
THIRD
0.06
Activations Density 0.014%