INDEX
Explanations
phrases indicating the need to ensure something is done
reassuring phrases or instructions emphasizing the importance of caution and thoroughness
New Auto-Interp
Negative Logits
igmatic
-0.72
ufact
-0.70
oub
-0.69
pione
-0.68
âĸ¬
-0.66
option
-0.65
obb
-0.65
elta
-0.65
derog
-0.64
Flavoring
-0.64
POSITIVE LOGITS
they
0.94
everyone
0.93
nobody
0.92
everything
0.92
everybody
0.90
you
0.90
we
0.86
that
0.83
there
0.80
it
0.73
Activations Density 0.031%