INDEX
Explanations
words related to physical appearance or attractiveness
New Auto-Interp
Negative Logits
ÄĽle
-0.16
meteor
-0.15
olest
-0.14
571
-0.14
_iface
-0.14
forman
-0.13
457
-0.13
_interfaces
-0.13
icipant
-0.13
headline
-0.13
POSITIVE LOGITS
ìĭ¸
0.16
a
0.15
encia
0.15
enf
0.15
Sick
0.15
ÏĢοÏĦε
0.15
o
0.14
course
0.14
orch
0.14
Surre
0.14
Activations Density 0.009%