INDEX
    Explanations

    phrases indicating negation or denial of responsibility

    New Auto-Interp
    Negative Logits
    ruba
    -0.17
    جر
    -0.17
    oklyn
    -0.16
    uzey
    -0.15
    íĥģ
    -0.14
    ë°į
    -0.14
    .Apis
    -0.14
    reopen
    -0.14
    utral
    -0.14
    arel
    -0.14
    POSITIVE LOGITS
     still
    0.18
    obs
    0.17
     mon
    0.16
    acho
    0.14
     fault
    0.14
    ora
    0.14
    era
    0.14
    alt
    0.14
    amine
    0.14
     ainda
    0.14
    Act Density 0.115%

    No Known Activations