INDEX
Explanations
expressions about desire or willingness for something
expressions of desire or intent related to wanting something
New Auto-Interp
Negative Logits
eda
-0.64
Å
-0.61
dos
-0.61
sweep
-0.60
itzer
-0.60
Y
-0.59
%%
-0.58
Dome
-0.58
late
-0.56
hov
-0.55
POSITIVE LOGITS
wanting
3.67
needing
2.12
wishing
2.03
liking
1.63
preferring
1.53
intending
1.51
hating
1.47
lacking
1.42
longing
1.34
craving
1.32
Activations Density 0.011%