INDEX
Explanations
words related to clothing items, specifically hoods
references to hoods or hooded garments
New Auto-Interp
Negative Logits
ngth
-0.80
andum
-0.78
lihood
-0.78
FORM
-0.78
tery
-0.73
MENT
-0.69
×ķ
-0.67
yond
-0.66
TABLE
-0.65
utics
-0.65
POSITIVE LOGITS
oos
1.11
ie
1.04
sie
1.00
ed
0.98
oo
0.94
ies
0.94
tip
0.93
ornament
0.93
elled
0.92
edo
0.86
Activations Density 0.057%