INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     causing
    -0.14
     causando
    -0.14
    adds
    -0.13
     helping
    -0.13
    allows
    -0.12
     Enables
    -0.12
     Allows
    -0.12
    Adds
    -0.12
     Helps
    -0.12
    Allows
    -0.12
    POSITIVE LOGITS
     produced
    0.11
     deliver
    0.10
    0.10
     given
    0.10
     ارائه
    0.09
     delivered
    0.09
    y
    0.09
    g
    0.09
     Ou
    0.09
     give
    0.09
    Act Density 0.110%

    No Known Activations