INDEX
Explanations
The neuron activates whenever it sees the name of a programming language.
New Auto-Interp
Negative Logits
three
-0.08
four
-0.07
five
-0.07
特
-0.07
qm
-0.06
átu
-0.06
strengthen
-0.06
tří
-0.06
Tournament
-0.06
büny
-0.06
POSITIVE LOGITS
AW
0.08
AGAIN
0.06
bás
0.06
any
0.06
аліз
0.06
aw
0.06
(Index
0.06
ALL
0.06
finally
0.06
visibility
0.06
Activations Density 0.037%