INDEX
Explanations
phrases indicating personal opinions or declarations
phrases that express personal opinions or assertions
New Auto-Interp
Negative Logits
estern
-0.72
ulent
-0.68
artments
-0.66
Loading
-0.62
poons
-0.62
pes
-0.61
Written
-0.61
Newsletter
-0.61
poon
-0.61
Ku
-0.60
POSITIVE LOGITS
goodbye
1.43
hello
1.09
Goodbye
1.02
congratulations
0.98
thank
0.95
congr
0.92
farewell
0.91
that
0.90
unequivocally
0.89
THANK
0.88
Activations Density 0.075%