INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.06
    stup
    -0.06
    -0.06
    ців
    -0.06
    -0.06
    someone
    -0.06
    okit
    -0.06
    евых
    -0.06
    -0.06
    χής
    -0.06
    POSITIVE LOGITS
     inflammatory
    0.08
    /fa
    0.07
     editorial
    0.07
     proudly
    0.07
     reversible
    0.06
     discovering
    0.06
     clearly
    0.06
     activates
    0.06
     produced
    0.06
     out
    0.06
    Act Density 0.016%

    No Known Activations