INDEX
Explanations
phrases expressing enjoyment and positive experiences
New Auto-Interp
Negative Logits
Cler
-0.18
else
-0.16
Else
-0.14
ãģ¤
-0.14
tan
-0.14
tera
-0.14
else
-0.14
ultipart
-0.14
Else
-0.14
dit
-0.13
POSITIVE LOGITS
apus
0.16
lein
0.14
stell
0.14
asty
0.14
ÛĮÙĩ
0.14
ÅĻÃŃm
0.14
ิà¸Ĺย
0.14
ÅĻich
0.14
θÎŃ
0.14
cái
0.13
Activations Density 0.080%