INDEX
Explanations
phrases indicating positive quality or approval
New Auto-Interp
Negative Logits
Sour
-0.18
oras
-0.16
overs
-0.15
oops
-0.15
escaping
-0.14
atik
-0.14
angs
-0.14
asaki
-0.14
ande
-0.14
ented
-0.13
POSITIVE LOGITS
reads
0.18
liest
0.18
resse
0.17
night
0.16
bye
0.16
lier
0.15
owler
0.15
æĦıæĢĿ
0.15
loe
0.15
isy
0.15
Activations Density 0.057%