INDEX
Explanations
calls to relax and enjoy various activities
phrases related to enjoyment and relaxation
New Auto-Interp
Negative Logits
so
-0.69
£ı
-0.69
terday
-0.65
Therefore
-0.63
Reply
-0.61
Same
-0.61
mishand
-0.60
aneous
-0.59
sole
-0.59
wrongly
-0.58
POSITIVE LOGITS
yourselves
1.51
yourself
1.40
Yourself
1.06
your
1.01
;)
0.99
yours
0.97
!
0.89
)!
0.86
YOUR
0.85
ðŁĻĤ
0.84
Activations Density 0.539%