INDEX
Explanations
phrases indicating proportions or fractions
New Auto-Interp
Negative Logits
enko
-0.15
999
-0.15
riad
-0.15
inic
-0.14
enk
-0.14
aley
-0.14
arias
-0.13
rof
-0.13
½
-0.13
ký
-0.13
POSITIVE LOGITS
third
0.81
third
0.73
THIRD
0.66
-third
0.65
Third
0.64
Third
0.63
fifth
0.62
第ä¸ī
0.61
thirds
0.60
_third
0.57
Activations Density 0.091%