INDEX
    Explanations

    negations or negative expressions in the text

    New Auto-Interp
    Negative Logits
    ิลปะ
    -0.68
    <bos>
    -0.68
     Atem
    -0.64
    ing
    -0.62
    teil
    -0.60
     Wadsworth
    -0.58
     bạch
    -0.58
     merito
    -0.56
    yana
    -0.56
    y
    -0.56
    POSITIVE LOGITS
     doesn
    1.87
     Doesn
    1.80
    doesn
    1.79
    Doesn
    1.78
     didn
    1.43
     DOES
    1.34
     Does
    1.29
    Does
    1.29
     does
    1.28
     Didn
    1.27
    Act Density 0.027%

    No Known Activations