INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     sne
    -0.08
    Mono
    -0.07
    broken
    -0.07
     acompanh
    -0.07
     make
    -0.07
     sorrow
    -0.07
    <center
    -0.07
    poke
    -0.06
     přím
    -0.06
    -0.06
    POSITIVE LOGITS
     us
    0.16
     Us
    0.12
    Us
    0.10
    us
    0.10
    US
    0.09
     US
    0.09
    (us
    0.09
    AS
    0.08
     UNS
    0.08
    -US
    0.08
    Act Density 0.025%

    No Known Activations