INDEX
Explanations
phrases that convey positive evaluation or approval
sentences that conclude with a period
New Auto-Interp
Negative Logits
²
-0.69
conspicuous
-0.67
aband
-0.66
habit
-0.66
unker
-0.65
aganda
-0.65
ixture
-0.64
isphere
-0.63
accumulated
-0.63
staggered
-0.63
POSITIVE LOGITS
Adds
1.02
Asked
0.98
<|endoftext|>
0.94
Ultimately
0.93
Saying
0.91
Refer
0.88
Nevertheless
0.87
Exactly
0.87
âĢķ
0.87
Such
0.87
Activations Density 0.086%