INDEX
    Explanations

    phrases that indicate intention or action

    New Auto-Interp
    Negative Logits
    /Common
    -0.15
     Happy
    -0.15
    weg
    -0.14
    essor
    -0.14
     er
    -0.14
     j
    -0.13
    ج
    -0.13
    /loose
    -0.13
     Mult
    -0.13
    CO
    -0.13
    POSITIVE LOGITS
    ieder
    0.16
    Uvs
    0.16
    IFn
    0.15
    afari
    0.15
    KNOWN
    0.15
    лиÑĤ
    0.14
    èm
    0.14
     interchange
    0.14
    aan
    0.14
    atatype
    0.14
    Act Density 0.156%

    No Known Activations