INDEX
Explanations
punctuation and text formatting used in titles and citations
New Auto-Interp
Negative Logits
bilt
-0.15
ocene
-0.15
ź
-0.14
ossier
-0.14
é½
-0.14
à¥Ĥष
-0.13
çĵ
-0.13
bjerg
-0.13
ensis
-0.13
nat
-0.13
POSITIVE LOGITS
eker
0.15
ata
0.14
725
0.14
otta
0.14
ITHER
0.14
cta
0.13
rema
0.13
ither
0.13
eta
0.13
779
0.13
Activations Density 0.079%