INDEX
    Explanations

    numerator / denominator

    New Auto-Interp
    Negative Logits
    er
    0.64
    em
    0.52
    u
    0.48
    ed
    0.43
    erà
    0.43
    ad
    0.43
    ার
    0.42
    x
    0.41
    غ
    0.40
    en
    0.39
    POSITIVE LOGITS
    м
    0.42
     is
    0.35
     clim
    0.34
    т
    0.34
    чен
    0.33
    t
    0.33
    ت
    0.32
    пан
    0.32
    ير
    0.32
     nies
    0.31
    Act Density 2.138%

    No Known Activations