INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     springfox
    -0.73
     وتسجيلات
    -0.65
     beginnetje
    -0.63
     OMITBAD
    -0.61
     <=",
    -0.60
    URLException
    -0.59
     nahilalakip
    -0.59
     ویکی‌پدی
    -0.56
     Référence
    -0.56
    enschaften
    -0.55
    POSITIVE LOGITS
     harder
    0.81
     stronger
    0.71
     more
    0.68
     wiser
    0.66
     easier
    0.63
     better
    0.62
     worse
    0.61
     tougher
    0.60
     lebih
    0.60
     funnier
    0.60
    Act Density 0.003%

    No Known Activations