INDEX
Explanations
sentences with positive sentiment, expressions of gratitude, and requests for feedback
New Auto-Interp
Negative Logits
instinct
-0.80
questioning
-0.80
elevated
-0.78
rallying
-0.77
imperson
-0.75
tricked
-0.74
undermining
-0.74
escaping
-0.73
raiding
-0.73
ensibly
-0.73
POSITIVE LOGITS
Lastly
1.63
<|endoftext|>
1.60
Additionally
1.42
Alternatively
1.39
Also
1.38
Anyway
1.30
Finally
1.24
Please
1.22
Enjoy
1.21
Otherwise
1.19
Activations Density 4.496%