INDEX
Explanations
references to physical attractiveness, specifically the term "handsome."
New Auto-Interp
Negative Logits
aval
-0.16
zer
-0.15
upa
-0.15
Äįka
-0.15
ivate
-0.15
715
-0.14
arsers
-0.14
ilig
-0.14
udeau
-0.14
887
-0.14
POSITIVE LOGITS
riott
0.15
irt
0.15
å½¢
0.14
Bylo
0.14
Sachs
0.14
esson
0.14
sap
0.14
anned
0.14
vell
0.14
mere
0.14
Activations Density 0.001%