INDEX
    Explanations

    negative expressions or sentiments

    New Auto-Interp
    Negative Logits
    N
    -0.17
    h
    -0.16
    980
    -0.16
    ivr
    -0.16
    biên
    -0.16
    jt
    -0.15
    H
    -0.15
    q
    -0.15
    Ìĥ
    -0.14
    Z
    -0.14
    POSITIVE LOGITS
    rog
    0.16
    sWith
    0.15
    Us
    0.15
    mite
    0.15
    oft
    0.15
    ecd
    0.15
    bole
    0.14
    erial
    0.14
    ador
    0.14
    gable
    0.14
    Act Density 0.071%

    No Known Activations