INDEX
Explanations
mentions related to fashion brands and endorsements
New Auto-Interp
Negative Logits
rend
-1.04
ugal
-0.86
aim
-0.79
aration
-0.75
ync
-0.75
reating
-0.74
HCR
-0.74
hew
-0.72
rarily
-0.72
Invalid
-0.72
POSITIVE LOGITS
remotely
1.12
though
0.89
handedly
0.88
tho
0.85
hinted
0.85
indirectly
0.84
joked
0.80
romeda
0.79
occasionally
0.78
swick
0.77
Activations Density 0.549%