INDEX
    Explanations

    phrases about taking action and making changes

    New Auto-Interp
    Negative Logits
    osaur
    -0.15
    avit
    -0.15
    ousse
    -0.14
    jos
    -0.14
    cassert
    -0.14
    enary
    -0.14
    GX
    -0.14
    ubah
    -0.14
    agh
    -0.14
    ahoma
    -0.14
    POSITIVE LOGITS
     again
    0.33
     Again
    0.24
    again
    0.23
    åĨį
    0.21
    _again
    0.21
    Again
    0.21
     improved
    0.20
     better
    0.20
     lại
    0.20
     novamente
    0.19
    Act Density 0.210%

    No Known Activations