INDEX
Explanations
expressions of enjoyment and positive experiences
New Auto-Interp
Negative Logits
ียà¸Ķ
-0.15
individ
-0.14
seins
-0.14
Injector
-0.14
AndGet
-0.14
ahren
-0.13
ocol
-0.13
avana
-0.13
$#
-0.13
ipop
-0.13
POSITIVE LOGITS
ayi
0.16
somehow
0.15
treff
0.14
escap
0.14
VERR
0.14
satisfaction
0.14
nÃło
0.14
Ñħи
0.13
able
0.13
bsolute
0.13
Activations Density 0.007%