INDEX
Explanations
references to voting choices and political decisions
New Auto-Interp
Negative Logits
.boot
-0.17
nde
-0.16
moz
-0.15
IPH
-0.15
.cloudflare
-0.15
ãĥ³ãĤº
-0.15
Normalize
-0.14
atica
-0.14
toItem
-0.14
normalize
-0.14
POSITIVE LOGITS
choice
0.17
casting
0.16
choice
0.16
éĢī
0.15
choosing
0.15
candidate
0.15
alignment
0.15
eros
0.14
alignment
0.14
abst
0.14
Activations Density 0.178%