INDEX
Explanations
words related to negative appearance or attributes
references to the concept of "ugliness."
New Auto-Interp
Negative Logits
ership
-0.97
ingham
-0.95
aver
-0.89
agall
-0.86
owder
-0.86
akings
-0.86
iasm
-0.85
erve
-0.84
ittee
-0.80
unes
-0.79
POSITIVE LOGITS
ugly
1.10
adolesc
0.89
beasts
0.79
elephant
0.75
scar
0.74
crap
0.73
mole
0.72
nasty
0.72
spectacle
0.70
scars
0.70
Activations Density 0.019%