INDEX
Explanations
The neuron activates on numeric tokens—especially floating‐point numbers.
New Auto-Interp
Negative Logits
righteousness
-0.06
~~~~~~~~~~~~~~~~
-0.06
SIDE
-0.06
.SQL
-0.06
conference
-0.06
Bers
-0.06
(By
-0.06
--------------------------------
-0.06
Warn
-0.06
Virgin
-0.06
POSITIVE LOGITS
�
0.08
escort
0.07
diffic
0.07
Jungle
0.06
kappa
0.06
-value
0.06
secret
0.06
MISSING
0.06
-launch
0.06
osh
0.06
Activations Density 0.008%