INDEX
Explanations
directed
This neuron activates on occurrences of the word “digraph” (i.e. the substring “dig‐raph”).
New Auto-Interp
Negative Logits
С
-0.07
enant
-0.07
Ease
-0.07
chnitt
-0.06
backs
-0.06
waitress
-0.06
isiert
-0.06
fourn
-0.06
insurers
-0.06
/\.
-0.06
POSITIVE LOGITS
окумент
0.06
614
0.06
ाष
0.06
occult
0.06
Rocket
0.06
stochastic
0.06
venge
0.06
nginx
0.06
agar
0.06
.Remote
0.06
Activations Density 0.004%