INDEX
Explanations
words related to the human body
references to the human body and its attributes
New Auto-Interp
Negative Logits
Hoover
-0.78
Clover
-0.73
Booth
-0.71
Ans
-0.69
ãĥĻ
-0.68
Doodle
-0.68
yrinth
-0.64
Fundamental
-0.62
Kafka
-0.62
Motorsport
-0.59
POSITIVE LOGITS
guards
1.30
builders
1.26
building
1.23
builder
1.22
guard
1.15
weight
1.14
politic
1.08
fat
1.05
parts
1.00
odor
0.94
Activations Density 0.037%