INDEX
    Explanations

    words that indicate strong actions, such as those related to affirmations, proposals, and findings

    New Auto-Interp
    Negative Logits
    ToLocal
    -0.17
    erts
    -0.16
    inka
    -0.15
    cake
    -0.15
     dabei
    -0.15
    еÑĢÑĤа
    -0.15
    uib
    -0.14
    ved
    -0.14
    assin
    -0.14
    roke
    -0.14
    POSITIVE LOGITS
     already
    0.23
    already
    0.23
     Already
    0.20
    even
    0.18
    Already
    0.18
     even
    0.18
    _already
    0.18
    جار
    0.16
     elsewhere
    0.16
     EVEN
    0.16
    Act Density 0.005%

    No Known Activations