INDEX
Explanations
phrases indicating a comparison between different options or sides of an argument
phrases contrasting two sides of an argument or situation
New Auto-Interp
Negative Logits
MJ
-0.67
brook
-0.67
DERR
-0.64
laun
-0.63
nance
-0.63
MSN
-0.62
ciating
-0.62
é¾
-0.62
rations
-0.61
renches
-0.60
POSITIVE LOGITS
accuser
0.84
accus
0.62
accusing
0.62
uthor
0.61
alone
0.61
Manafort
0.59
applaud
0.58
lihood
0.58
Reps
0.57
icka
0.57
Activations Density 0.035%