INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     compost
    -0.07
     vans
    -0.07
    Teachers
    -0.07
    .Question
    -0.07
    chat
    -0.07
     Candid
    -0.06
    fila
    -0.06
     Removal
    -0.06
     Inspir
    -0.06
    TextWriter
    -0.06
    POSITIVE LOGITS
     صف
    0.07
     tâm
    0.06
     EntryPoint
    0.06
    电话
    0.06
    ices
    0.06
    intendo
    0.06
     보기
    0.06
     anonym
    0.06
    0.06
    과정
    0.06
    Act Density 0.077%

    No Known Activations