INDEX
Explanations
phrases indicating desire or preference
expressions of desire or intent
New Auto-Interp
Negative Logits
errors
-0.69
livious
-0.68
enthusi
-0.66
iky
-0.63
EStreamFrame
-0.63
mitter
-0.62
squ
-0.62
dq
-0.61
Dro
-0.60
depend
-0.60
POSITIVE LOGITS
sake
0.71
awaru
0.70
reprene
0.69
purposes
0.69
fuller
0.66
better
0.66
anything
0.64
acion
0.63
Continue
0.63
succeed
0.62
Activations Density 0.059%