INDEX
Explanations
references to different options or alternatives
conjunctions that connect contrasting or alternative ideas
New Auto-Interp
Negative Logits
ires
-0.72
efer
-0.71
onday
-0.67
ETS
-0.65
Pony
-0.63
elson
-0.63
NOW
-0.63
Hoo
-0.63
legraph
-0.61
ocracy
-0.61
POSITIVE LOGITS
acle
1.38
nam
1.30
acles
1.28
chid
1.18
otherwise
1.12
Else
1.11
chard
1.08
ifice
1.08
acular
1.05
nery
0.95
Activations Density 0.151%