INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    =top
    -0.07
     mime
    -0.07
     numa
    -0.07
    ,temp
    -0.07
     holistic
    -0.06
     پسر
    -0.06
    디어
    -0.06
     DFS
    -0.06
    -groups
    -0.06
    ospel
    -0.06
    POSITIVE LOGITS
    quiv
    0.08
    .base
    0.07
    ITED
    0.06
     Articles
    0.06
    0.06
     strike
    0.06
    ikers
    0.06
    ymology
    0.06
    electric
    0.06
    0.06
    Act Density 0.003%

    No Known Activations