INDEX
Explanations
clothing-related items, specifically focusing on hoods
references to hoods and hooded garments
New Auto-Interp
Negative Logits
ngth
-0.81
lihood
-0.79
ãĤ´ãĥ³
-0.77
MENT
-0.75
FORM
-0.74
andum
-0.73
tery
-0.73
×ķ
-0.72
igslist
-0.67
berman
-0.66
POSITIVE LOGITS
ornament
1.04
ie
1.03
oos
1.01
ed
1.00
tip
0.90
sie
0.90
ies
0.89
ook
0.85
hood
0.85
edo
0.84
Activations Density 0.036%