INDEX
    Explanations

    phrases indicating progress and completion of tasks or events

    New Auto-Interp
    Negative Logits
     never
    -0.20
     immediately
    -0.19
     still
    -0.18
    still
    -0.18
     sofort
    -0.18
     masih
    -0.18
     stayed
    -0.18
     NEVER
    -0.17
    remain
    -0.17
     quickly
    -0.17
    POSITIVE LOGITS
     fully
    0.28
     finishes
    0.27
     finish
    0.27
     stabil
    0.23
     finished
    0.22
     finishing
    0.21
     figure
    0.21
     complete
    0.20
     Fully
    0.20
     hopefully
    0.20
    Act Density 0.283%

    No Known Activations