INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    🅓
    -0.07
     Cause
    -0.07
     selectedIndex
    -0.07
     pest
    -0.07
     Accent
    -0.07
     Masc
    -0.06
    -0.06
     Anyone
    -0.06
     Bew
    -0.06
    .SK
    -0.06
    POSITIVE LOGITS
    ира
    0.08
    (Image
    0.07
     тоже
    0.07
    _MINOR
    0.07
    呕吐
    0.07
    特有的
    0.07
     racially
    0.07
    Ě
    0.07
    ição
    0.07
     Featuring
    0.07
    Act Density 0.018%

    No Known Activations