INDEX
Explanations
negative evaluations or criticisms
negative assessments or criticisms of arguments and positions
New Auto-Interp
Negative Logits
gins
-0.71
interrupted
-0.69
ahs
-0.69
downed
-0.66
ien
-0.65
rollers
-0.65
dreaded
-0.63
ominated
-0.62
Phones
-0.62
mins
-0.61
POSITIVE LOGITS
insofar
0.96
because
0.87
extrap
0.83
unless
0.81
ctive
0.77
considering
0.77
enough
0.76
nonetheless
0.76
nonsense
0.74
headed
0.74
Activations Density 0.151%