INDEX
Explanations
words related to the human body
references to the human body and its physical aspects
New Auto-Interp
Negative Logits
Hoover
-0.84
Clover
-0.76
Ans
-0.73
Kafka
-0.72
Booth
-0.72
Mub
-0.67
Nex
-0.66
Trey
-0.65
Doodle
-0.65
yrinth
-0.64
POSITIVE LOGITS
guards
1.30
building
1.26
builders
1.23
builder
1.18
weight
1.11
guard
1.11
parts
1.08
politic
0.99
fat
0.99
wash
0.94
Activations Density 0.028%