INDEX
    Explanations

    phrases indicating actions or intentions, particularly those starting with "to."

    New Auto-Interp
    Negative Logits
    ovah
    -0.16
    uchs
    -0.16
    raya
    -0.16
    otate
    -0.16
    loys
    -0.15
    oland
    -0.15
    asons
    -0.15
    &uuml
    -0.15
    ouden
    -0.15
    ctica
    -0.14
    POSITIVE LOGITS
    kol
    0.14
    Äħż
    0.14
    ustr
    0.14
     Brewer
    0.14
    arsi
    0.13
     rot
    0.13
     Flo
    0.13
    Crow
    0.13
    angu
    0.13
    453
    0.13
    Act Density 0.039%

    No Known Activations