INDEX
Explanations
expressions of passion and care for others
New Auto-Interp
Negative Logits
487
-0.16
dum
-0.16
asy
-0.16
é
-0.14
hev
-0.14
cala
-0.14
.lu
-0.14
covered
-0.14
ugg
-0.14
ĨĴ
-0.14
POSITIVE LOGITS
mant
0.15
genu
0.14
wanting
0.14
igr
0.14
rut
0.14
igram
0.14
à¥Ĥष
0.14
ãģĴ
0.14
tlement
0.14
REC
0.14
Activations Density 0.135%