INDEX
Explanations
mentions of specific clothing items, particularly dresses
repeated mentions of the word "dress."
New Auto-Interp
Negative Logits
ntil
-0.82
Blessed
-0.78
JV
-0.65
ocalyptic
-0.62
interrupted
-0.61
Cursed
-0.61
rehend
-0.61
exist
-0.60
condemned
-0.59
ventures
-0.58
POSITIVE LOGITS
maker
0.96
dresses
0.96
Dress
0.94
glers
0.93
gown
0.92
rehearsal
0.92
bag
0.90
attire
0.89
shoes
0.87
dress
0.86
Activations Density 0.009%