INDEX
    Explanations

    negative phrases or terms related to rejection or disapproval

    New Auto-Interp
    Negative Logits
    pga
    -0.18
    bee
    -0.17
    ko
    -0.17
    hev
    -0.17
    rott
    -0.17
    ropriate
    -0.15
    nya
    -0.14
    rape
    -0.14
    ritt
    -0.14
    ritten
    -0.14
    POSITIVE LOGITS
    sey
    0.29
    veau
    0.28
    okie
    0.26
    seg
    0.25
    ont
    0.24
    xious
    0.23
    odge
    0.23
    isi
    0.23
     holds
    0.23
    oks
    0.22
    Act Density 0.034%

    No Known Activations