INDEX
Explanations
phrases indicating ambition or dedication to achieving goals
New Auto-Interp
Negative Logits
atee
-0.15
/up
-0.15
most
-0.15
weg
-0.15
xit
-0.14
k
-0.14
nee
-0.14
ega
-0.14
/by
-0.14
ne
-0.14
POSITIVE LOGITS
harder
0.25
towards
0.24
toward
0.23
hardest
0.23
-hard
0.22
Towards
0.20
hard
0.19
hard
0.19
Towards
0.18
HARD
0.18
Activations Density 0.009%