INDEX
Explanations
phrases indicating comparison or simile
New Auto-Interp
Negative Logits
kate
-0.17
Hayward
-0.16
benh
-0.16
851
-0.15
kovi
-0.15
abay
-0.15
šk
-0.14
ãĤ¯ãĤ»
-0.14
Dann
-0.14
akh
-0.14
POSITIVE LOGITS
possible
0.32
Possible
0.26
possible
0.24
posible
0.23
_possible
0.23
Possible
0.22
possÃŃvel
0.21
possibile
0.21
ammo
0.20
åı¯èĥ½
0.20
Activations Density 0.027%