INDEX
Explanations
the term "Poly" with high activation values
New Auto-Interp
Negative Logits
Watson
-0.71
Refuge
-0.69
Bauer
-0.67
keeper
-0.64
Redemption
-0.62
Byrd
-0.62
Lauder
-0.61
recall
-0.60
trove
-0.60
CRIP
-0.60
POSITIVE LOGITS
gon
1.41
phony
1.22
morph
1.21
meric
1.16
chrome
1.15
mers
1.14
ester
1.14
nesia
1.11
mer
1.08
techn
1.04
Activations Density 0.009%