INDEX
Explanations
phrases indicating a desire or request
expressions of desire or intent
New Auto-Interp
Negative Logits
correct
-0.69
forth
-0.65
reg
-0.64
seals
-0.63
fragments
-0.63
consistent
-0.63
borne
-0.63
hy
-0.62
stag
-0.62
propag
-0.62
POSITIVE LOGITS
Want
3.11
Want
3.11
want
1.68
WANT
1.56
Need
1.33
Interested
1.25
Wanted
1.25
Need
1.16
want
1.11
Feel
1.11
Activations Density 0.013%