INDEX
Explanations
words related to negative attributes or descriptions
references to the concept of "ugliness."
New Auto-Interp
Negative Logits
ership
-0.95
ingham
-0.93
aver
-0.90
idential
-0.89
owder
-0.86
akings
-0.83
uther
-0.82
cedented
-0.81
OTA
-0.79
ortium
-0.77
POSITIVE LOGITS
ugly
1.11
adolesc
0.83
elephant
0.80
crap
0.77
beasts
0.77
spectacle
0.76
fallout
0.75
glare
0.71
fades
0.71
barb
0.70
Activations Density 0.006%