INDEX
Explanations
phrases indicating contrasting information
phrases that begin with "Yet."
New Auto-Interp
Negative Logits
starting
-0.63
basically
-0.60
interests
-0.59
balls
-0.59
grav
-0.56
orders
-0.56
comm
-0.56
advis
-0.56
kind
-0.54
periodically
-0.54
POSITIVE LOGITS
Yet
3.21
Yet
2.72
yet
2.53
yet
2.09
Nonetheless
1.51
Nevertheless
1.48
But
1.44
Indeed
1.40
Worse
1.34
Moreover
1.32
Activations Density 0.018%