INDEX
Explanations
preferences and choices related to various topics
phrases expressing preference or recommendation between options.
New Auto-Interp
Negative Logits
//
-0.28
umpang
-0.27
merak
-0.26
บล
-0.24
делу
-0.24
Italijanski
-0.22
hilangan
-0.21
žnost
-0.21
chluss
-0.20
Sorg
-0.20
POSITIVE LOGITS
preference
3.31
prefer
3.20
preferred
3.08
Preference
2.94
prefers
2.92
prefer
2.92
Prefer
2.91
preferring
2.86
preference
2.80
preferred
2.73
Activations Density 0.684%