INDEX
Explanations
references to the body and its physiological aspects
New Auto-Interp
Negative Logits
Hoover
-0.80
Mub
-0.76
Crime
-0.74
Booth
-0.72
hift
-0.71
Doodle
-0.71
Clover
-0.71
Ń·
-0.69
Kafka
-0.67
Ans
-0.66
POSITIVE LOGITS
builders
1.16
building
1.15
builder
1.10
guards
1.10
fat
0.97
weight
0.97
guard
0.95
politic
0.94
fluids
0.89
cavity
0.88
Activations Density 0.017%