INDEX
    Explanations

    phrases indicating likelihood or probability

    phrases that express similarity or comparison

    New Auto-Interp
    Negative Logits
    arse
    -0.86
    utical
    -0.82
    alt
    -0.82
    ocaust
    -0.77
     helicop
    -0.75
    ographies
    -0.74
    bard
    -0.73
    isexual
    -0.73
    itles
    -0.72
    ategory
    -0.72
    POSITIVE LOGITS
    lier
    0.89
    lihood
    0.86
     premature
    0.73
     somebody
    0.69
     everybody
    0.69
    liest
    0.68
     everyone
    0.68
     they
    0.66
     fireworks
    0.66
     someone
    0.66
    Act Density 0.023%

    No Known Activations