INDEX
Explanations
words related to approval and endorsement
New Auto-Interp
Negative Logits
ynos
-0.17
ideon
-0.15
anny
-0.15
ieren
-0.15
ii
-0.14
ugu
-0.14
yer
-0.14
sin
-0.14
eton
-0.14
umann
-0.13
POSITIVE LOGITS
ebek
0.21
eck
0.18
º
0.16
uzz
0.15
apesh
0.15
escorte
0.14
.transport
0.14
åĽ
0.14
èĤĸ
0.14
obraz
0.14
Activations Density 0.006%