INDEX
Explanations
phrases related to headwear and clothing items
New Auto-Interp
Negative Logits
citiz
-0.75
ITNESS
-0.74
newsp
-0.73
practition
-0.73
Accountability
-0.73
KNOWN
-0.71
GoldMagikarp
-0.71
Reviewer
-0.70
Qual
-0.70
adolesc
-0.66
POSITIVE LOGITS
mith
1.30
poons
1.25
cale
1.25
creen
1.15
etting
1.13
hell
1.13
poon
1.12
avers
1.11
hare
1.10
ucker
1.08
Activations Density 0.226%