INDEX
Explanations
instances where the text is soliciting actions or reactions from the reader
New Auto-Interp
Negative Logits
consecut
-0.69
SPONSORED
-0.69
unfavorable
-0.68
attributed
-0.63
contradictory
-0.63
urrent
-0.62
uitous
-0.61
Examination
-0.60
juxtap
-0.60
interchange
-0.60
POSITIVE LOGITS
!:
0.97
!.
0.92
!
0.87
rejoice
0.86
ya
0.84
!,
0.81
!'
0.78
!!
0.78
!!!
0.76
!".
0.75
Activations Density 0.787%