INDEX
Explanations
sentences signaling disapproval or criticism
sentences that conclude with punctuation marks
New Auto-Interp
Negative Logits
endeav
-0.75
instinct
-0.69
pill
-0.69
endeavour
-0.68
horrend
-0.68
charms
-0.67
arching
-0.65
compositions
-0.65
bred
-0.65
erning
-0.65
POSITIVE LOGITS
Lastly
1.27
UNCLASSIFIED
1.20
Related
1.20
Finally
1.16
Meanwhile
1.13
Another
1.11
Additionally
1.11
Asked
1.09
<|endoftext|>
1.08
Notably
1.06
Activations Density 0.529%