INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    1.25
    ne
    1.11
    ה
    1.07
    1.06
    ll
    1.03
    1.01
    ل
    0.98
    0.90
    ng
    0.88
    ه
    0.87
    POSITIVE LOGITS
     a
    1.23
    Lip
    1.13
    ir
    1.00
    0.93
     Lip
    0.90
    M
    0.90
    ков
    0.86
    lipid
    0.86
    omena
    0.84
    1
    0.83
    Act Density 0.004%

    No Known Activations