INDEX
    Explanations

    negative imperatives or admonitions

    New Auto-Interp
    Negative Logits
    Datuak
    -0.83
    лтемелер
    -0.72
     ویکی‌پدیا
    -0.70
     couverte
    -0.70
     mourut
    -0.69
     CreateTagHelper
    -0.69
     scattata
    -0.68
    Hauptartikel
    -0.67
    OGND
    -0.66
     medesimo
    -0.65
    POSITIVE LOGITS
     forget
    0.78
     afraid
    0.63
     Donny
    0.62
     forgetting
    0.62
     Don
    0.61
    Don
    0.60
     Jangan
    0.60
     يتيمه
    0.59
    Dont
    0.57
    ванович
    0.57
    Act Density 0.047%

    No Known Activations