INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    י
    1.20
     danni
    1.10
    Have
    1.05
     aprob
    1.05
     haue
    1.00
    ל
    1.00
     alimentare
    0.99
    IL
    0.98
     miliardi
    0.98
    ן
    0.98
    POSITIVE LOGITS
     to
    1.37
    ig
    1.37
    us
    1.37
    ad
    1.33
    em
    1.30
    im
    1.28
    íme
    1.26
    ä
    1.25
    te
    1.16
    ning
    1.13
    Act Density 0.000%

    No Known Activations