INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     lobbyists
    -0.07
     Hak
    -0.06
    .trigger
    -0.06
    _upgrade
    -0.06
    lav
    -0.06
    ochen
    -0.06
    ница
    -0.06
    ографія
    -0.06
    \Html
    -0.06
     hunger
    -0.06
    POSITIVE LOGITS
    ollow
    0.08
     italia
    0.07
    assist
    0.06
     WINDOW
    0.06
     elektr
    0.06
     tres
    0.06
     какие
    0.06
     priv
    0.06
     dout
    0.06
    ffield
    0.06
    Act Density 0.097%

    No Known Activations