INDEX
Explanations
phrases indicating desires or intentions
expressions of desire or intent
New Auto-Interp
Negative Logits
pite
-0.80
muster
-0.74
rir
-0.69
recomm
-0.67
yielding
-0.63
render
-0.62
permitting
-0.62
Cosponsors
-0.62
ously
-0.61
attempting
-0.61
POSITIVE LOGITS
someday
0.93
ASAP
0.82
louder
0.78
cool
0.68
sooner
0.68
revenge
0.67
reprene
0.66
daddy
0.65
rid
0.65
bigger
0.65
Activations Density 0.322%