INDEX
Explanations
first person personal pronouns followed by positive sentiment
phrases centered around personal experience and perspective
New Auto-Interp
Negative Logits
batch
-0.82
pak
-0.73
itialized
-0.69
orously
-0.69
vertisements
-0.69
Haunted
-0.69
compatible
-0.68
flush
-0.68
quartered
-0.68
mosp
-0.67
POSITIVE LOGITS
sake
1.33
purposes
1.22
selves
0.93
ummies
0.92
reasons
0.88
liking
0.82
personally
0.80
purpose
0.78
own
0.71
dear
0.70
Activations Density 0.096%