INDEX
Explanations
phrases indicating progress or alignment towards goals
New Auto-Interp
Negative Logits
formace
-0.15
anches
-0.14
beg
-0.14
obus
-0.14
andes
-0.14
apia
-0.14
oba
-0.14
æĻ¨
-0.14
oucher
-0.13
веÑģÑĤи
-0.13
POSITIVE LOGITS
track
0.36
target
0.31
course
0.30
track
0.28
tract
0.27
schedule
0.27
pace
0.26
-track
0.25
target
0.24
Track
0.24
Activations Density 0.050%