INDEX
Explanations
phrases indicating progress or development in various contexts
New Auto-Interp
Negative Logits
Hundred
-0.20
Forty
-0.19
Sevent
-0.18
Fifty
-0.18
Thousands
-0.17
Thirty
-0.17
thousand
-0.17
Thousands
-0.17
Thousand
-0.16
ousand
-0.15
POSITIVE LOGITS
three
0.77
four
0.76
five
0.71
six
0.66
seven
0.63
two
0.63
three
0.61
eight
0.60
ä¸ī个
0.58
nine
0.56
Activations Density 1.835%