INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     bedroom
    -0.06
    (java
    -0.06
     زیاد
    -0.06
    adow
    -0.06
    agnostic
    -0.06
    -0.06
     máu
    -0.06
     doctoral
    -0.06
     اس
    -0.06
    =top
    -0.06
    POSITIVE LOGITS
     Yelp
    0.07
    (parameter
    0.06
    oruč
    0.06
    (expect
    0.06
     Sick
    0.06
     Fantasy
    0.06
    SO
    0.06
     Filipino
    0.06
     lokale
    0.06
     Bil
    0.06
    Act Density 0.039%

    No Known Activations