INDEX
Explanations
The neuron selectively activates on tokens representing floating‐point or decimal numeric values.
New Auto-Interp
Negative Logits
peninsula
-0.06
Hernandez
-0.06
eno
-0.06
pouch
-0.05
imating
-0.05
recognizes
-0.05
signing
-0.05
fold
-0.05
donation
-0.05
-short
-0.05
POSITIVE LOGITS
é
0.07
طلق
0.07
stalk
0.07
assim
0.07
.isSuccessful
0.06
071
0.06
logits
0.06
している
0.06
coordinate
0.06
ailable
0.06
Activations Density 0.169%