INDEX
    Explanations

    sentences or phrases indicating a problem or issue

    the phrase "there's something wrong" or variations of it

    New Auto-Interp
    Negative Logits
    incinn
    -0.74
    NetMessage
    -0.73
    cit
    -0.72
    herer
    -0.70
    xit
    -0.67
    weeney
    -0.66
    aukee
    -0.66
    pole
    -0.65
    è¦ļéĨĴ
    -0.64
    achev
    -0.64
    POSITIVE LOGITS
    headed
    0.78
    eous
    0.74
    fully
    0.71
     behaviour
    0.71
    mouth
    0.70
     wrong
    0.69
     havoc
    0.69
    aligned
    0.67
    align
    0.66
    doing
    0.65
    Act Density 0.012%

    No Known Activations