INDEX
Explanations
words related to preferences and choices
terms related to preferences
New Auto-Interp
Negative Logits
meal
-0.74
mans
-0.73
manship
-0.69
mberg
-0.67
Interstitial
-0.67
FORMATION
-0.67
together
-0.67
rooms
-0.66
semble
-0.64
angel
-0.64
POSITIVE LOGITS
preferences
1.02
favoring
0.90
favoured
0.84
preference
0.82
eering
0.77
ministic
0.76
favors
0.76
fav
0.76
Preferences
0.75
ately
0.75
Activations Density 0.039%