INDEX
Explanations
references to parts of the face, like lips, cheeks, thighs, and saliva
references to body parts, particularly those associated with intimacy and attraction
New Auto-Interp
Negative Logits
CAST
-0.90
ENCY
-0.76
udeb
-0.71
Rational
-0.70
æ©Ł
-0.67
Epic
-0.67
é¾įåĸļ士
-0.65
KNOWN
-0.65
à¦
-0.65
ãĤ¤ãĥĪ
-0.65
POSITIVE LOGITS
lips
1.31
creen
1.22
mith
1.20
pring
1.06
lip
1.03
ipop
0.97
cheeks
0.94
ĸļ
0.92
terday
0.88
lip
0.87
Activations Density 0.008%