INDEX
Explanations
phrases ending with proper punctuation
instances of strong emotional or impactful language
New Auto-Interp
Negative Logits
withdraw
-0.82
ignt
-0.75
citiz
-0.73
conclude
-0.71
ertodd
-0.69
gements
-0.69
detract
-0.68
uphold
-0.68
dissu
-0.67
prosec
-0.66
POSITIVE LOGITS
Consider
0.92
Sure
0.89
SHARE
0.86
Seriously
0.86
Unless
0.85
Published
0.84
Actor
0.82
Fans
0.81
DON
0.79
Except
0.79
Activations Density 0.484%