INDEX
    Explanations

    negative descriptors and terms related to criticism

    New Auto-Interp
    Negative Logits
    is
    -0.51
    ia
    -0.40
    d
    -0.32
    ÙĬ
    -0.31
    us
    -0.30
    i
    -0.29
    t
    -0.28
    Ø©
    -0.28
    al
    -0.28
    e
    -0.27
    POSITIVE LOGITS
    ror
    0.17
    othy
    0.17
    thern
    0.16
    bid
    0.15
    respond
    0.15
    theast
    0.15
    phan
    0.14
    بÛĮÙĨ
    0.14
    иÑĪ
    0.14
    rr
    0.13
    Act Density 0.035%

    No Known Activations