INDEX
    Explanations

    phrases indicating confrontation or opposition

    New Auto-Interp
    Negative Logits
     Helpful
    -0.69
    catentry
    -0.68
    vous
    -0.66
     poisoning
    -0.62
    ghan
    -0.60
    otom
    -0.59
    ients
    -0.58
    irlf
    -0.57
    batch
    -0.56
    zsche
    -0.56
    POSITIVE LOGITS
    atoon
    0.75
     doorway
    0.69
    ailable
    0.69
    against
    0.68
    é¾įå¥ij士
    0.67
    ardless
    0.67
    elight
    0.67
    thouse
    0.66
    wark
    0.66
    iage
    0.65
    Act Density 0.064%

    No Known Activations