INDEX
    Explanations

    words indicating continuance or persistence

    New Auto-Interp
    Negative Logits
     Already
    -0.17
    already
    -0.16
    oland
    -0.16
     already
    -0.16
    portun
    -0.15
    Already
    -0.15
     már
    -0.15
    alat
    -0.15
    ãĥĨãĥ«
    -0.15
    гал
    -0.15
    POSITIVE LOGITS
    ders
    0.31
     constant
    0.24
     true
    0.23
     intact
    0.23
     unchanged
    0.22
    (ed
    0.20
     faithful
    0.20
     committed
    0.19
     steadfast
    0.18
     alive
    0.18
    Act Density 0.033%

    No Known Activations