INDEX
Explanations
phrases indicating progress and completion of tasks or events
New Auto-Interp
Negative Logits
never
-0.20
immediately
-0.19
still
-0.18
still
-0.18
sofort
-0.18
masih
-0.18
stayed
-0.18
NEVER
-0.17
remain
-0.17
quickly
-0.17
POSITIVE LOGITS
fully
0.28
finishes
0.27
finish
0.27
stabil
0.23
finished
0.22
finishing
0.21
figure
0.21
complete
0.20
Fully
0.20
hopefully
0.20
Activations Density 0.283%