INDEX
Explanations
phrases indicating a preference for one option over another
phrases that reflect a preference for one option or criterion over another
New Auto-Interp
Negative Logits
aeus
-0.76
aspers
-0.70
rend
-0.70
obal
-0.69
worm
-0.68
omach
-0.68
rake
-0.65
ospons
-0.64
cia
-0.63
hiba
-0.63
POSITIVE LOGITS
ãĤ¦ãĤ¹
0.71
âĢİ
0.70
theirs
0.69
à¨
0.69
à¨
0.61
rather
0.60
GROUND
0.60
Eternity
0.59
others
0.59
iman
0.59
Activations Density 0.392%