INDEX
    Explanations

    instances of negation or denial

    New Auto-Interp
    Negative Logits
     "
    -0.61
     '
    -0.58
    cination
    -0.49
    us
    -0.48
    -0.47
    on
    -0.45
    andar
    -0.44
    oster
    -0.44
     nạ
    -0.44
    is
    -0.43
    POSITIVE LOGITS
    ’)
    1.42
    ’).
    1.38
    ’.
    1.27
    ’”
    1.26
    )’
    1.24
    ),”
    1.24
    ’,
    1.23
    ”),
    1.23
    ’?
    1.21
    ’:
    1.21
    Act Density 0.090%

    No Known Activations