INDEX
Explanations
difference
This neuron activates on occurrences of the word “difference,” i.e. on requests to compute or mention a subtraction between values.
New Auto-Interp
Negative Logits
Anton
-0.07
rotating
-0.07
Salmon
-0.06
om
-0.06
ps
-0.06
sw
-0.06
Node
-0.06
.ru
-0.06
Bot
-0.06
Manor
-0.06
POSITIVE LOGITS
replaceAll
0.07
handleMessage
0.07
discontin
0.07
YPRE
0.07
difference
0.07
Difference
0.07
mind
0.06
ythe
0.06
"value
0.06
battle
0.06
Activations Density 0.007%