INDEX
Explanations
words related to the human body
mentions of body and body image
New Auto-Interp
Negative Logits
Hoover
-0.80
Kafka
-0.71
Clover
-0.71
Ans
-0.68
ãĥĻ
-0.68
Pis
-0.68
Booth
-0.67
Nex
-0.67
Dickens
-0.62
Jarrett
-0.62
POSITIVE LOGITS
guards
1.34
builders
1.19
guard
1.19
building
1.18
builder
1.14
weight
1.02
parts
1.01
politic
0.94
wash
0.92
anguage
0.91
Activations Density 0.033%