INDEX
Explanations
phrases indicating desire or permission to act freely without restriction
expressions of desire or willingness
New Auto-Interp
Negative Logits
livious
-0.76
enthusi
-0.71
cling
-0.67
ynski
-0.64
onut
-0.61
BuyableInstoreAndOnline
-0.59
Echoes
-0.58
pher
-0.57
gre
-0.57
faint
-0.56
POSITIVE LOGITS
PLA
0.74
without
0.74
mares
0.72
WITHOUT
0.69
to
0.69
regardless
0.69
ably
0.67
them
0.66
.
0.65
*.
0.65
Activations Density 0.052%