INDEX
    Explanations

    negative affirmations or expressions of absence

    "no" followed by a noun

    no followed by negation

    New Auto-Interp
    Negative Logits
    the
    -0.55
    <bos>
    -0.51
     terakhir
    -0.50
    Some
    -0.50
     redor
    -0.48
     persino
    -0.48
     متعلقه
    -0.47
     culturelles
    -0.47
     فريبيس
    -0.46
     gangen
    -0.46
    POSITIVE LOGITS
    tably
    1.00
     doubt
    0.93
     longer
    0.92
    tifies
    0.91
    etheless
    0.89
    tifying
    0.84
    vartis
    0.84
    coda
    0.81
    odles
    0.81
    xious
    0.81
    Act Density 0.129%

    No Known Activations