INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    	word
    -0.07
    flies
    -0.06
     currents
    -0.06
     trope
    -0.06
     recommends
    -0.06
     Inquiry
    -0.06
     bought
    -0.06
     weeks
    -0.06
     derivative
    -0.06
     Brazilian
    -0.06
    POSITIVE LOGITS
     UObject
    0.08
     Yayın
    0.07
    azer
    0.07
    (dat
    0.07
    bou
    0.06
    редел
    0.06
    τη
    0.06
     Jackie
    0.06
    ync
    0.06
     в
    0.06
    Act Density 0.022%

    No Known Activations