INDEX
Explanations
requests for interaction or feedback in a text
punctuation and expressions of engagement or prompting feedback
New Auto-Interp
Negative Logits
awakening
-0.83
acute
-0.79
rescuing
-0.77
corrosion
-0.73
Ͻ
-0.72
lifes
-0.72
reversing
-0.67
footing
-0.67
immersion
-0.67
emaker
-0.66
POSITIVE LOGITS
<|endoftext|>
1.11
Feedback
1.04
:)
0.97
Otherwise
0.97
EDIT
0.95
Comments
0.92
Suggest
0.92
Thanks
0.92
ðŁij
0.91
;)
0.91
Activations Density 0.145%