INDEX
Explanations
math equations
The neuron selectively activates on numeric literal tokens (especially floating‐point numbers).
New Auto-Interp
Negative Logits
+a
-0.06
Grace
-0.06
Kane
-0.06
ctrl
-0.06
admire
-0.06
ela
-0.06
fila
-0.06
Social
-0.06
Pattern
-0.06
task
-0.06
POSITIVE LOGITS
�
0.06
rover
0.06
urr
0.06
.Free
0.06
-stack
0.06
Hamas
0.06
grams
0.06
ONS
0.06
explained
0.06
seizures
0.06
Activations Density 0.004%