INDEX
Explanations
phrases instructing or encouraging actions
commands or instructions that encourage action or engagement
New Auto-Interp
Negative Logits
emale
-0.71
edge
-0.70
album
-0.69
Reply
-0.61
ungle
-0.61
iege
-0.59
hell
-0.58
eller
-0.58
burden
-0.57
apo
-0.57
POSITIVE LOGITS
yourselves
1.05
yourself
1.02
ings
0.85
Yourself
0.83
ardless
0.81
thou
0.79
ye
0.75
ya
0.71
Tata
0.71
able
0.68
Activations Density 0.209%