INDEX
    Explanations

    abbreviations and acronyms

    New Auto-Interp
    Negative Logits
    u
    0.86
    al
    0.85
    in
    0.83
    y
    0.79
    on
    0.78
    r
    0.76
    er
    0.73
    an
    0.73
    at
    0.73
    o
    0.70
    POSITIVE LOGITS
    İ
    0.99
    Ö
    0.89
    UCK
    0.88
    Ü
    0.84
    Б
    0.80
    Ş
    0.80
    ӧ
    0.80
    OTE
    0.78
    BN
    0.78
    ICK
    0.78
    Act Density 0.643%

    No Known Activations