INDEX
Explanations
references to hoods or hooded garments
New Auto-Interp
Negative Logits
quis
-0.17
geber
-0.15
UTOR
-0.15
apper
-0.15
tran
-0.15
ube
-0.15
Wilde
-0.14
nels
-0.14
cce
-0.14
axon
-0.14
POSITIVE LOGITS
lum
0.34
oo
0.23
ed
0.22
Hood
0.22
rat
0.21
ies
0.20
rats
0.19
ie
0.17
Nack
0.17
hood
0.16
Activations Density 0.005%