INDEX
    Explanations

    phrases indicating limitations in studies or arguments

    New Auto-Interp
    Negative Logits
    .
    -0.67
     تضيفلها
    -0.57
    BARA
    -0.53
    qrt
    -0.53
    َه
    -0.51
     دیکھیے
    -0.49
    翡翠
    -0.48
     kasarigan
    -0.47
    ִי
    -0.47
    tır
    -0.47
    POSITIVE LOGITS
     تانيه
    1.03
     يتيمه
    0.73
     ویکی‌پدیا
    0.69
     كومونز
    0.61
    }]
    
    0.60
    contentLoaded
    0.59
    RegressionTest
    0.58
    '])->
    0.58
    istoitu
    0.57
    '],
    
    0.57
    Act Density 0.595%

    No Known Activations