INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ویکی‌پدیای
    -0.55
    prende
    -0.50
    DULE
    -0.46
    ंदीखरीदारी
    -0.45
    ētu
    -0.45
    SequentialGroup
    -0.44
     الرياضيه
    -0.44
    ibrated
    -0.44
    jspx
    -0.44
    De
    -0.44
    POSITIVE LOGITS
    AddTagHelper
    0.69
    TagMode
    0.61
     own
    0.61
     opérateur
    0.59
     OWN
    0.57
    Tango
    0.56
    ]<<"
    0.56
     ex
    0.55
    Referanser
    0.54
     Logement
    0.53
    Act Density 0.004%

    No Known Activations