INDEX
Explanations
phrases that indicate methods, instructions, or ways to accomplish tasks
New Auto-Interp
Negative Logits
only
-0.15
Favor
-0.14
See
-0.14
whatever
-0.14
endar
-0.14
çľĭçľĭ
-0.14
dna
-0.14
Need
-0.14
Don
-0.13
ando
-0.13
POSITIVE LOGITS
best
0.34
best
0.30
proceed
0.28
Proceed
0.27
BEST
0.25
-best
0.24
(best
0.24
Best
0.24
approach
0.23
_best
0.23
Activations Density 0.089%