INDEX
    Explanations

    explanations of reasons why

    New Auto-Interp
    Negative Logits
     uomo
    0.99
     kommen
    0.95
     penatibus
    0.95
    -------------
    0.93
    0.92
     Behold
    0.91
     Besitz
    0.91
    u
    0.91
    Bad
    0.89
    人都
    0.89
    POSITIVE LOGITS
     reasons
    1.23
     reason
    1.22
     why
    1.20
    Reasons
    1.17
     warum
    1.11
     detrás
    1.03
     razón
    1.02
     السبب
    0.98
     bypass
    0.97
    यंस
    0.95
    Act Density 0.272%

    No Known Activations