INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    als
    0.73
    rends
    0.73
    ts
    0.72
    ata
    0.71
    se
    0.70
    rie
    0.70
    í
    0.69
    ets
    0.68
    sk
    0.68
    ti
    0.67
    POSITIVE LOGITS
    1.07
    ه
    1.04
    ה
    0.80
    0.73
    a
    0.72
    0.68
    ة
    0.64
    0.61
    )\
    0.60
    ));
    0.59
    Act Density 0.046%

    No Known Activations