INDEX
Explanations
phrases indicating contrasting or contradictory statements
affirmations or expressions of agreement
New Auto-Interp
Negative Logits
]:
-0.79
lier
-0.76
ESE
-0.72
interstitial
-0.71
uten
-0.69
Connector
-0.69
backer
-0.69
]).
-0.68
opsis
-0.68
ipel
-0.68
POSITIVE LOGITS
admittedly
0.98
technically
0.95
disagree
0.89
inconvenience
0.79
but
0.77
imperfect
0.76
superf
0.75
laud
0.74
initially
0.73
merits
0.71
Activations Density 0.607%