INDEX
Explanations
phrases indicating certainty or emphasis
phrases asserting impossibility or the absence of a method
New Auto-Interp
Negative Logits
ilts
-0.81
rongh
-0.81
asts
-0.79
eg
-0.79
livest
-0.78
igl
-0.70
aples
-0.69
origin
-0.69
lux
-0.68
itton
-0.68
POSITIVE LOGITS
whatsoever
0.92
else
0.77
anybody
0.77
anyone
0.72
Reviewer
0.72
anymore
0.70
ouver
0.66
THEY
0.66
point
0.64
around
0.62
Activations Density 0.036%