INDEX
Explanations
references to the human body and its characteristics
New Auto-Interp
Negative Logits
avigator
-0.16
umber
-0.15
stin
-0.15
nable
-0.15
sti
-0.15
abelle
-0.15
enberg
-0.15
coh
-0.14
maj
-0.14
bove
-0.14
POSITIVE LOGITS
guards
0.23
guard
0.21
weight
0.20
gren
0.16
wide
0.16
gaard
0.16
elter
0.15
558
0.15
pháºŃn
0.15
-body
0.14
Activations Density 0.042%