INDEX
Explanations
references to clothing items, particularly various types of coats and shirts
New Auto-Interp
Negative Logits
alat
-0.15
entin
-0.15
thon
-0.15
ertools
-0.15
/format
-0.14
hip
-0.14
ob
-0.14
entar
-0.14
etto
-0.14
dressed
-0.13
POSITIVE LOGITS
worn
0.34
wearer
0.23
wear
0.19
पह
0.17
unic
0.17
malfunction
0.16
tails
0.16
rental
0.16
Rental
0.16
Optional
0.15
Activations Density 0.128%