INDEX
Explanations
statements or actions indicating progress or change
phrases indicating actions taken or steps proposed in various contexts
New Auto-Interp
Negative Logits
Corpus
-0.75
Sheep
-0.69
Hitch
-0.68
Chains
-0.67
Waste
-0.66
Bie
-0.66
Anch
-0.65
ench
-0.64
peas
-0.63
Lies
-0.63
POSITIVE LOGITS
toward
0.81
precautions
0.77
ndum
0.74
remed
0.74
steps
0.73
backward
0.72
ãĤ¸
0.71
towards
0.70
proactive
0.68
offline
0.68
Activations Density 0.064%