INDEX
Explanations
instances of clothing and accessories being worn or described
New Auto-Interp
Negative Logits
wear
-0.17
wearable
-0.16
olate
-0.16
flashlight
-0.16
wearer
-0.15
shine
-0.15
vier
-0.15
orea
-0.14
xes
-0.14
alat
-0.14
POSITIVE LOGITS
hij
0.18
nothing
0.17
εÏĨ
0.16
LOB
0.15
Hij
0.14
cuckold
0.14
_RAM
0.14
501
0.14
nothing
0.14
ismatch
0.14
Activations Density 0.037%