INDEX
Explanations
phrases related to enjoying, experiencing, or interacting with something
phrases related to ending or concluding events or experiences
New Auto-Interp
Negative Logits
Ĭ±
-0.57
los
-0.56
disbanded
-0.52
Reconstruction
-0.52
Ń
-0.51
Bridge
-0.51
¬¼
-0.51
Internal
-0.51
instituted
-0.50
Historically
-0.49
POSITIVE LOGITS
yourself
1.33
yourselves
1.23
Yourself
0.94
your
0.87
YOUR
0.79
your
0.69
Your
0.63
Your
0.60
browsing
0.59
omet
0.58
Activations Density 0.732%