INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    p
    1.52
     Από
    1.33
    il
    1.32
    x
    1.23
    on
    1.22
    ik
    1.20
    for
    1.20
    ب
    1.20
     frutas
    1.19
    0
    1.18
    POSITIVE LOGITS
    t
    1.16
    یی
    1.15
    ра
    1.13
    с
    1.13
    ır
    1.02
    в
    1.02
    ми
    1.02
    0.98
    0.96
    сны
    0.95
    Act Density 0.005%

    No Known Activations