INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ãĤªãĥª
    -0.14
    riz
    -0.14
    ispens
    -0.14
    ÑĢÑĥб
    -0.13
     Reform
    -0.13
    chief
    -0.13
     Opr
    -0.13
     resett
    -0.13
    .link
    -0.13
     magg
    -0.13
    POSITIVE LOGITS
    dre
    0.17
    anga
    0.16
    swer
    0.16
    ÙĩÙĨ
    0.15
    acoes
    0.15
    acao
    0.14
    iren
    0.14
    iversit
    0.14
    018
    0.14
    اص
    0.14
    Act Density 0.028%

    No Known Activations