INDEX
Explanations
phrases related to clothing items and styles
references to specific clothing and fashion items
New Auto-Interp
Negative Logits
downstream
-0.77
onential
-0.75
rencies
-0.74
terness
-0.72
Nuclear
-0.72
ithmetic
-0.70
ETHOD
-0.70
Torrent
-0.70
Kumar
-0.69
uclear
-0.68
POSITIVE LOGITS
worn
1.52
wardrobe
1.26
scarf
1.26
waist
1.25
adorned
1.23
trousers
1.22
sleeves
1.21
hairst
1.21
underwear
1.18
wore
1.18
Activations Density 0.555%