INDEX
Explanations
phrases related to giving advice or instructions
phrases indicating suggestions or recommendations
New Auto-Interp
Negative Logits
argues
-0.84
acknowledges
-0.67
contends
-0.67
advocates
-0.65
asserts
-0.65
cite
-0.64
ensures
-0.63
ths
-0.63
Polit
-0.62
cites
-0.62
POSITIVE LOGITS
apest
0.77
orage
0.71
—"
0.70
â̦"
0.70
â̦"
0.69
â̦."
0.66
mosqu
0.64
conflic
0.63
fitt
0.62
prank
0.62
Activations Density 1.637%