INDEX
    Explanations

    references to significant actions or decisions and their importance within the context

    New Auto-Interp
    Negative Logits
    actics
    -0.17
    awan
    -0.16
    \<^
    -0.16
    ories
    -0.15
    geç
    -0.14
    ãģĹãĤĩ
    -0.14
    ihat
    -0.14
    abbo
    -0.13
    itis
    -0.13
     Antwort
    -0.13
    POSITIVE LOGITS
     mistake
    0.32
     mistakes
    0.27
     distinction
    0.26
     noises
    0.26
     acquaintance
    0.26
     contribution
    0.25
     decisions
    0.25
     connection
    0.24
     adjustments
    0.24
     decision
    0.24
    Act Density 0.115%

    No Known Activations