INDEX
Explanations
phrases indicating questioning or expressing gratitude
New Auto-Interp
Negative Logits
avour
-0.16
altung
-0.15
ynes
-0.15
uther
-0.15
psc
-0.14
burgh
-0.14
ynec
-0.13
оÑĢож
-0.13
ovan
-0.13
кÑĥÑĢ
-0.13
POSITIVE LOGITS
proxy
0.16
indul
0.16
gangbang
0.15
tank
0.15
èĥ
0.15
Painter
0.15
cogn
0.15
cpy
0.14
Interr
0.14
gent
0.14
Activations Density 0.032%