INDEX
Explanations
comparisons, particularly those referencing competitive matchups
comparative structures and phrases detailing competition or rivalry
New Auto-Interp
Negative Logits
).[
-0.77
".[
-0.62
."[
-0.61
moreover
-0.54
Slate
-0.53
lobbying
-0.49
nonetheless
-0.48
SetTextColor
-0.47
versely
-0.47
sshd
-0.47
POSITIVE LOGITS
Replay
0.63
?'
0.62
?",
0.59
;)
0.59
',
0.56
aturday
0.53
iru
0.53
\'
0.52
*/
0.52
haha
0.51
Activations Density 2.970%