INDEX
Explanations
mentions of T-shirts with specific characteristics or messages
references to T-shirts
New Auto-Interp
Negative Logits
minimized
-0.71
theless
-0.69
pim
-0.66
prof
-0.65
degraded
-0.65
wiret
-0.63
rhy
-0.61
quar
-0.60
halluc
-0.60
sanct
-0.60
POSITIVE LOGITS
shirt
1.05
shirts
1.01
rex
0.92
minus
0.90
iron
0.90
agonist
0.90
adic
0.86
level
0.85
squ
0.82
eye
0.81
Activations Density 0.052%