INDEX
    Explanations

    phrases that indicate disagreement or contrast

    rather than, which was, which results, that gives

    New Auto-Interp
    Negative Logits
     يتيمه
    -0.64
     Савезне
    -0.64
    rrggbb
    -0.56
    Diweddarwch
    -0.55
    Brainz
    -0.55
     Theſe
    -0.53
    цездатний
    -0.52
    AISSEE
    -0.51
    vician
    -0.49
    delwed
    -0.49
    POSITIVE LOGITS
     समीक्षाएं
    0.38
    gds
    0.33
     alb
    0.33
     lethal
    0.33
    hasErrors
    0.33
    unen
    0.33
    define
    0.33
    uas
    0.33
    unk
    0.33
     artery
    0.32
    Act Density 0.227%

    No Known Activations