INDEX
Explanations
phrases emphasizing abundance and quantity
New Auto-Interp
Negative Logits
dana
-0.16
arend
-0.15
Umb
-0.15
Tal
-0.15
utar
-0.14
pte
-0.14
acher
-0.14
иÑĩеÑģки
-0.14
neg
-0.13
optera
-0.13
POSITIVE LOGITS
etc
0.22
ÙĪØºÙĬر
0.21
plus
0.20
plus
0.18
many
0.17
etc
0.17
çŃī
0.16
çŃī
0.16
among
0.16
Plus
0.16
Activations Density 0.214%