INDEX
Explanations
sum to 100
This neuron detects phrasing around numeric totals summing to 100%, especially rounding‐related disclaimers.
New Auto-Interp
Negative Logits
[...]↵↵
-0.07
okie
-0.06
gratitude
-0.06
Roe
-0.06
ト
-0.06
наличие
-0.06
Pole
-0.06
@
-0.06
५
-0.06
@↵↵
-0.06
POSITIVE LOGITS
violently
0.07
�
0.07
politik
0.06
gil
0.06
readFile
0.06
(gl
0.06
Carb
0.06
bağır
0.06
Turing
0.06
-court
0.06
Activations Density 0.020%