INDEX
    Explanations

    expressions of negation or denial

    New Auto-Interp
    Negative Logits
    ei
    -0.20
    y
    -0.19
    a
    -0.19
    eid
    -0.19
    e
    -0.17
    c
    -0.16
    ern
    -0.16
    o
    -0.16
    à¸Ľà¸£à¸°à¸¡à¸²à¸ĵ
    -0.15
    ÛĮ
    -0.15
    POSITIVE LOGITS
    etwork
    0.20
    _REF
    0.17
    ’t
    0.17
    mue
    0.17
    naire
    0.16
    't
    0.16
    ouncements
    0.15
    atural
    0.15
    iqu
    0.15
    avigate
    0.15
    Act Density 0.184%

    No Known Activations