INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    um
    0.63
    iaal
    0.57
    ées
    0.57
    iril
    0.57
    umine
    0.54
    isches
    0.53
    ône
    0.52
    éraires
    0.52
     عليه
    0.51
    romes
    0.51
    POSITIVE LOGITS
    IS
    0.64
    ה
    0.64
    a
    0.59
    IN
    0.58
    EA
    0.57
    ELI
    0.57
    AW
    0.56
    EL
    0.54
    on
    0.53
    AR
    0.53
    Act Density 0.000%

    No Known Activations