INDEX
    Explanations

    **avoid** intention already key

    New Auto-Interp
    Negative Logits
     of
    1.13
    Т
    1.00
     at
    0.93
    О
    0.92
    ב
    0.92
     ovale
    0.89
    ular
    0.88
    ing
    0.88
     anglais
    0.88
    -
    0.88
    POSITIVE LOGITS
    1.14
    !)
    0.75
     ነው
    0.69
    contacto
    0.68
    in
    0.64
    ha
    0.63
    hemm
    0.63
     पाहून
    0.62
    hol
    0.62
    が増
    0.62
    Act Density 0.706%

    No Known Activations