INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    in
    1.16
    of
    0.87
    are
    0.84
    were
    0.80
    iname
    0.76
    it
    0.73
    ak
    0.73
    p
    0.72
    am
    0.72
    je
    0.72
    POSITIVE LOGITS
    ри
    0.87
    ;
    0.87
    0.66
    IMUM
    0.63
    سم
    0.62
    يز
    0.61
    ส่ง
    0.61
    ).$$
    0.61
    ÂN
    0.60
    ט
    0.60
    Act Density 0.090%

    No Known Activations