INDEX
Explanations
words related to preferences
references to personal preferences and choices
New Auto-Interp
Negative Logits
mberg
-0.80
mans
-0.76
Adams
-0.75
wordpress
-0.75
WARN
-0.72
bane
-0.72
Sus
-0.70
Interstitial
-0.69
manship
-0.68
icles
-0.67
POSITIVE LOGITS
preferences
1.09
favoring
0.95
preference
0.91
eering
0.87
pane
0.80
favoured
0.80
favors
0.79
ļéĨĴ
0.74
elig
0.72
favored
0.72
Activations Density 0.018%