INDEX
    Explanations

    words associated with actions or decisions made by characters

    New Auto-Interp
    Negative Logits
    gow
    -0.17
    esub
    -0.15
    apter
    -0.15
    tick
    -0.15
    anvas
    -0.14
     пода
    -0.13
    tır
    -0.13
    tit
    -0.13
    elve
    -0.13
     Naw
    -0.13
    POSITIVE LOGITS
     kön
    0.32
     dür
    0.29
     können
    0.29
     könnte
    0.27
     lassen
    0.26
     wollen
    0.24
     mö
    0.23
     mü
    0.23
     möchten
    0.23
     sollen
    0.22
    Act Density 0.013%

    No Known Activations