INDEX
Explanations
phrases indicating initial actions or steps toward achieving solutions
New Auto-Interp
Negative Logits
Guid
-0.15
orman
-0.15
374
-0.15
usercontent
-0.14
fate
-0.14
anten
-0.14
uben
-0.13
alles
-0.13
aman
-0.13
STANCE
-0.13
POSITIVE LOGITS
starters
0.21
step
0.20
begin
0.20
Partial
0.19
start
0.19
partial
0.18
begins
0.18
Towards
0.17
progress
0.17
Begin
0.17
Activations Density 0.192%