INDEX
Explanations
punctuation
This neuron highlights numeric tokens—particularly the rating scale numbers (e.g. 1, 60, 80, 100) and decimal values—used in the score/evaluation prompts.
New Auto-Interp
Negative Logits
�
-0.07
78
-0.07
Ά
-0.07
sediment
-0.07
stripes
-0.07
Grid
-0.07
�
-0.07
worms
-0.06
Keith
-0.06
kills
-0.06
POSITIVE LOGITS
andbox
0.06
Çin
0.06
Wak
0.06
기반
0.06
_um
0.05
envoy
0.05
USART
0.05
discriminate
0.05
ований
0.05
іння
0.05
Activations Density 0.005%