INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     تعرض
    -0.08
     anex
    -0.08
     Synthetic
    -0.08
    Synthetic
    -0.07
     basically
    -0.07
     лиш
    -0.07
     año
    -0.07
     assistants
    -0.07
    (native
    -0.07
     approx
    -0.07
    POSITIVE LOGITS
     rumors
    0.09
    0.09
     nghe
    0.09
     vẻ
    0.08
     réputation
    0.08
    0.08
    0.08
     gehoord
    0.08
     heard
    0.08
     પ્રમાણે
    0.08
    Act Density 0.038%

    No Known Activations