INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ς
    0.36
    ı
    0.35
     jeste
    0.34
    หรือ
    0.34
    но
    0.33
     I
    0.33
    0.32
     ç
    0.32
    おそらく
    0.32
    ρα
    0.31
    POSITIVE LOGITS
    in
    0.54
    n
    0.42
    as
    0.42
    r
    0.41
    ad
    0.41
    et
    0.41
    c
    0.40
    t
    0.40
    g
    0.39
    y
    0.38
    Act Density 0.290%

    No Known Activations