INDEX
Explanations
mentions of clothing items, particularly hats and headwear
New Auto-Interp
Negative Logits
arda
-0.17
yte
-0.17
bufsize
-0.15
MOVED
-0.15
thighs
-0.14
dge
-0.14
shirt
-0.14
Sofa
-0.13
icha
-0.13
æijĨ
-0.13
POSITIVE LOGITS
hat
0.54
hats
0.49
hat
0.47
Hat
0.46
Hat
0.42
Hats
0.40
帽
0.39
_hat
0.36
cap
0.35
caps
0.34
Activations Density 0.056%