INDEX
    Explanations

    phrases that indicate a negative outcome or denial

    New Auto-Interp
    Negative Logits
    ray
    -0.16
    rape
    -0.16
    lng
    -0.16
    erville
    -0.15
    pga
    -0.15
    imonial
    -0.14
    ness
    -0.14
    packing
    -0.14
    like
    -0.14
    rech
    -0.14
    POSITIVE LOGITS
    oks
    0.35
    sey
    0.33
    okie
    0.33
    xious
    0.31
    ok
    0.30
    veau
    0.30
    thin
    0.28
    thern
    0.28
    ther
    0.28
    seg
    0.27
    Act Density 0.039%

    No Known Activations