INDEX
    Explanations

    words related to persistence or continuity over time

    New Auto-Interp
    Negative Logits
    already
    -0.17
     Already
    -0.17
     ARISING
    -0.17
    гал
    -0.16
    Already
    -0.16
    oland
    -0.16
     artık
    -0.16
     already
    -0.16
    gone
    -0.14
    onec
    -0.14
    POSITIVE LOGITS
     unchanged
    0.32
     intact
    0.31
    ders
    0.29
     steadfast
    0.27
     constant
    0.26
     untouched
    0.25
     faithful
    0.23
     unaffected
    0.23
     true
    0.22
     steady
    0.22
    Act Density 0.042%

    No Known Activations