INDEX
    Explanations

    states, conditions, and concepts

    New Auto-Interp
    Negative Logits
     a
    1.12
    at
    1.02
    and
    0.96
    อบ
    0.84
    f
    0.82
    j
    0.80
    dro
    0.79
     do
    0.78
     de
    0.77
    all
    0.73
    POSITIVE LOGITS
    ز
    1.04
    н
    1.01
    т
    0.96
    :
    0.91
    '
    0.88
    д
    0.85
    з
    0.84
     불구하고
    0.76
    г
    0.75
    х
    0.75
    Act Density 0.001%

    No Known Activations