INDEX
Explanations
expressions of personal struggle or frustration
New Auto-Interp
Negative Logits
ulong
-0.18
urge
-0.17
á»ĵ
-0.16
voie
-0.16
ceive
-0.15
oyo
-0.15
å¼ĥ
-0.14
atsapp
-0.14
suce
-0.14
rike
-0.14
POSITIVE LOGITS
hav
0.37
mov
0.35
tak
0.35
mixin
0.33
makin
0.32
com
0.30
signin
0.30
lett
0.29
mov
0.28
adin
0.28
Activations Density 0.143%