INDEX
Explanations
The neuron activates on floating‐point numeric tokens (e.g., probabilities or small decimal values) in the model’s token stream.
New Auto-Interp
Negative Logits
vistas
-0.07
pt
-0.06
Raid
-0.06
raid
-0.06
iled
-0.06
.None
-0.06
crowned
-0.06
elt
-0.06
lesbians
-0.06
oled
-0.06
POSITIVE LOGITS
awaiting
0.09
waiting
0.08
for
0.07
reinforcements
0.07
,
0.07
.sax
0.07
awaited
0.07
↵↵
0.07
dis
0.06
?
0.06
Activations Density 0.011%