INDEX
Explanations
mathematical expressions and punctuation
New Auto-Interp
Negative Logits
","","
-1.00
zhen
-0.98
primero
-0.96
pso
-0.94
molte
-0.93
dettagli
-0.92
primeiro
-0.90
alers
-0.89
cohol
-0.89
だけではなく
-0.88
POSITIVE LOGITS
and
0.86
коли
0.84
vilja
0.83
Lastly
0.83
کتاب
0.82
роз
0.81
řské
0.79
comédie
0.79
säga
0.78
ETRIC
0.78
Activations Density 0.031%