INDEX
Explanations
a mix of words that often appear in natural language text while also giving a higher activation to numbers
New Auto-Interp
Negative Logits
Many
-0.80
many
-0.79
Many
-0.77
many
-0.77
MANY
-0.77
muchas
-0.75
muchos
-0.72
Muchos
-0.71
muitos
-0.71
molte
-0.69
POSITIVE LOGITS
so
2.77
so
1.93
So
1.55
So
1.47
SO
1.43
så
1.38
così
1.26
ſo
1.18
sooo
1.17
так
1.16
Activations Density 6.443%