INDEX
Explanations
references to physical body features or modifications
New Auto-Interp
Negative Logits
ablishment
-0.72
ĨĴ
-0.71
ostics
-0.66
eers
-0.65
ADRA
-0.63
Bulldogs
-0.63
hower
-0.62
Luk
-0.62
ablish
-0.60
prest
-0.60
POSITIVE LOGITS
red
1.13
lets
1.05
ring
1.04
lett
1.01
crow
0.97
fed
0.96
let
0.93
abs
0.93
face
0.93
uler
0.93
Activations Density 0.020%