INDEX
Explanations
mentions of clothing or accessories
phrases that describe various items of clothing being worn by individuals
New Auto-Interp
Negative Logits
ancock
-0.91
Dhabi
-0.82
terness
-0.80
agonist
-0.78
Fiscal
-0.77
Distance
-0.75
raq
-0.74
Reviewer
-0.74
Seeking
-0.72
orrow
-0.72
POSITIVE LOGITS
sleeves
0.94
nails
0.93
cloth
0.85
parap
0.84
panties
0.83
prost
0.83
gloves
0.83
mustache
0.83
dolls
0.80
stereotypical
0.80
Activations Density 0.184%