INDEX
Explanations
phrases posing a question or issue
points of contention or debate in discussions
New Auto-Interp
Negative Logits
supposedly
-0.70
latter
-0.66
Ö¼
-0.64
comparatively
-0.63
ostensibly
-0.63
largely
-0.63
inexpl
-0.62
purported
-0.61
evidently
-0.61
purportedly
-0.60
POSITIVE LOGITS
Yourself
1.05
yourself
1.02
yourselves
0.89
Your
0.84
YOUR
0.83
your
0.80
your
0.79
Reduce
0.78
!:
0.77
Submit
0.77
Activations Density 0.567%