INDEX
Explanations
actionable phrases or directives
imperative phrases and commands
New Auto-Interp
Negative Logits
ersen
-0.51
inous
-0.50
bys
-0.49
plent
-0.48
pez
-0.48
odox
-0.48
eatured
-0.47
izoph
-0.47
uesday
-0.46
eatures
-0.45
POSITIVE LOGITS
to
0.82
To
0.73
to
0.72
To
0.71
.
0.71
.*
0.65
.''.
0.65
Created
0.62
.''
0.61
!
0.60
Activations Density 0.984%