INDEX
Explanations
references to the concept of "body" in various contexts
New Auto-Interp
Negative Logits
ery
-0.24
umber
-0.21
eries
-0.18
ures
-0.17
ally
-0.16
atoria
-0.15
ophobia
-0.15
erna
-0.15
ERY
-0.15
imum
-0.15
POSITIVE LOGITS
guards
0.35
guard
0.30
politic
0.27
builders
0.26
builder
0.25
weight
0.25
building
0.24
wide
0.23
mind
0.19
674
0.19
Activations Density 0.041%