INDEX
Explanations
This neuron responds to structural or metadata markers (e.g. system/user/assistant tags and special boundary tokens) rather than the actual text content.
New Auto-Interp
Negative Logits
abbreviation
-0.07
Ages
-0.07
Mona
-0.07
anymore
-0.07
,"%
-0.06
ifes
-0.06
Alley
-0.06
recovered
-0.06
pione
-0.06
eldo
-0.06
POSITIVE LOGITS
.DateField
0.06
endforeach
0.06
0.06
selon
0.06
Https
0.06
едини
0.06
RESP
0.06
후
0.06
tienen
0.06
helm
0.06
Activations Density 0.098%