INDEX
Explanations
words related to clothing or attire
references to dress codes and dressing up
New Auto-Interp
Negative Logits
ntil
-0.77
venants
-0.73
SHIP
-0.72
rw
-0.70
apt
-0.68
ichael
-0.67
ocalyptic
-0.63
JV
-0.63
SPONSORED
-0.60
asper
-0.60
POSITIVE LOGITS
gown
0.95
glers
0.92
rehearsal
0.90
uce
0.86
maker
0.84
bag
0.83
shirts
0.82
shoes
0.81
uniforms
0.79
dresses
0.79
Activations Density 0.031%