INDEX
Explanations
phrases related to self-presentation and dress codes
New Auto-Interp
Negative Logits
ucz
-0.16
Roof
-0.14
ometown
-0.14
Sofa
-0.13
gums
-0.13
HANDLE
-0.13
roofs
-0.13
stabilized
-0.12
ÑĢеб
-0.12
roof
-0.12
POSITIVE LOGITS
clothing
0.52
dress
0.50
clothes
0.49
dressing
0.49
fashion
0.48
wardrobe
0.47
Dress
0.46
Clothing
0.45
Fashion
0.42
dresses
0.42
Activations Density 0.442%