INDEX
    Explanations

    abbreviations and acronyms

    New Auto-Interp
    Negative Logits
     какой
    1.00
    ور
    0.99
    шт
    0.96
     которым
    0.96
    мес
    0.95
    کند
    0.93
    êmement
    0.92
     който
    0.91
    多人
    0.89
    способ
    0.88
    POSITIVE LOGITS
    s
    1.40
    d
    1.36
    the
    1.24
    The
    1.20
    L
    1.17
    she
    1.12
    F
    1.11
    T
    1.09
    dem
    1.08
    S
    1.08
    Act Density 0.051%

    No Known Activations