INDEX
    Explanations

    concepts related to influence and impact

    New Auto-Interp
    Negative Logits
    rong
    -0.17
    uele
    -0.15
    ngr
    -0.15
    rans
    -0.14
    abad
    -0.14
    amber
    -0.14
     Particip
    -0.14
    lou
    -0.14
     deem
    -0.14
    awa
    -0.13
    POSITIVE LOGITS
     bring
    0.21
     brings
    0.18
     inf
    0.18
     bringing
    0.16
    indr
    0.16
    aped
    0.16
     Bring
    0.16
    -INF
    0.15
     create
    0.15
     prod
    0.15
    Act Density 0.133%

    No Known Activations