INDEX
Explanations
the word "preferred."
the word "preferred" in various contexts
New Auto-Interp
Negative Logits
Tour
-0.75
arta
-0.72
á
-0.72
del
-0.71
bane
-0.70
akening
-0.70
amaz
-0.70
inas
-0.69
alien
-0.68
Breaking
-0.68
POSITIVE LOGITS
preferred
1.22
embodiments
0.90
preferring
0.81
Preferred
0.79
prefers
0.78
plurality
0.76
favoured
0.76
endings
0.76
embodiment
0.75
favored
0.75
Activations Density 0.005%