INDEX
    Explanations

    articles and determiners in the text

    New Auto-Interp
    Negative Logits
    anou
    -0.17
    amerate
    -0.15
    ppo
    -0.15
    arih
    -0.14
    ulner
    -0.14
    æłª
    -0.14
    ÅĻÃŃm
    -0.14
     VÅ¡
    -0.14
    ียà¸Ķ
    -0.14
    PÅĻed
    -0.14
    POSITIVE LOGITS
     par
    0.17
     Bernstein
    0.16
     role
    0.15
    jet
    0.15
     Thomson
    0.14
     positive
    0.13
     component
    0.13
     c
    0.13
     corridor
    0.13
     Cent
    0.13
    Act Density 0.104%

    No Known Activations