INDEX
    Explanations

    modified version, sentence, molecule, activity

    New Auto-Interp
    Negative Logits
    8
    1.59
     are
    1.41
     is
    1.39
    ला
    1.23
    ש
    1.20
    7
    1.13
    ana
    1.09
    OS
    1.07
     μ
    1.05
     ۸
    1.01
    POSITIVE LOGITS
    ו
    2.05
    an
    1.60
    و
    1.60
    م
    1.41
    b
    1.38
    1.36
    1.36
    ل
    1.35
    u
    1.34
    ul
    1.32
    Act Density 0.013%

    No Known Activations