INDEX
Explanations
mentions of personal preferences
words related to preferences and choices
New Auto-Interp
Negative Logits
mberg
-0.85
manship
-0.77
Sus
-0.72
wordpress
-0.69
WARN
-0.69
inventoryQuantity
-0.69
semble
-0.67
bane
-0.67
angel
-0.66
ãĥĥãĥī
-0.66
POSITIVE LOGITS
preferences
1.02
favoring
0.94
eering
0.90
preference
0.86
pane
0.80
favoured
0.78
eus
0.75
favored
0.73
favors
0.73
ately
0.73
Activations Density 0.025%