INDEX
    Explanations

    assure quality and correctness

    New Auto-Interp
    Negative Logits
    af
    1.08
     (
    0.95
    ik
    0.90
    т
    0.83
    ت
    0.82
     on
    0.82
     
    0.78
    ion
    0.78
    nels
    0.77
    ain
    0.76
    POSITIVE LOGITS
    ۰
    1.63
    1.22
    1.17
    ні
    1.09
    1.07
    ع
    1.06
     નામ
    1.00
    0.99
    その
    0.98
    0.96
    Act Density 0.002%

    No Known Activations