INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    (down
    -0.07
    -0.07
    on
    -0.07
     робіт
    -0.07
    -0.07
     stimulated
    -0.07
    money
    -0.07
     Glory
    -0.07
     stunt
    -0.07
    ו�
    -0.07
    POSITIVE LOGITS
     Each
    0.12
     each
    0.12
    Each
    0.10
    each
    0.10
     forEach
    0.09
     JC
    0.09
    .each
    0.08
    _each
    0.08
     EACH
    0.08
    ACS
    0.08
    Act Density 0.060%

    No Known Activations