INDEX
Explanations
mentions of personal preferences, choices, or inclinations
references to individual preferences or choices
New Auto-Interp
Negative Logits
shut
-0.74
CHA
-0.73
lehem
-0.72
sum
-0.68
Alam
-0.68
shattered
-0.66
Constable
-0.66
torn
-0.65
rave
-0.64
acre
-0.63
POSITIVE LOGITS
preference
3.95
preferences
3.37
Preferences
2.36
preferred
1.85
preferential
1.62
aversion
1.51
preferring
1.44
Pref
1.40
prefer
1.39
liking
1.36
Activations Density 0.017%