INDEX
Explanations
numerical values and quantities
New Auto-Interp
Negative Logits
/
-0.78
↵↵
-0.76
—
-0.76
-0.75
er
-0.75
(
-0.72
"
-0.72
_
-0.72
-
-0.71
“
-0.70
POSITIVE LOGITS
eighteen
2.70
nineteen
2.67
seventeen
2.60
fourteen
2.60
sixteen
2.60
thirteen
2.58
fifteen
2.52
twelve
2.39
eighty
2.31
twenty
2.31
Activations Density 0.149%