INDEX
    Explanations

    Word or phrase

    New Auto-Interp
    Negative Logits
    看的
    -0.09
     olhando
    -0.09
     nipa
    -0.08
     hearings
    -0.08
     הייתי
    -0.08
     beleza
    -0.08
    mittedly
    -0.08
    书记
    -0.08
     cler
    -0.08
    ẹwo
    -0.08
    POSITIVE LOGITS
    (embed
    0.08
    sx
    0.08
     Shiv
    0.08
    ING
    0.08
     sito
    0.07
     W
    0.07
    (close
    0.07
     CST
    0.07
    فات
    0.07
     Cann
    0.07
    Act Density 0.006%

    No Known Activations