INDEX
    Explanations

    short phrases that are followed by a rhetorical question or a statement with emphasis

    phrases indicating awareness or understanding of a situation

    New Auto-Interp
    Negative Logits
    Ire
    -0.78
    aldi
    -0.72
    é¾įå
    -0.70
    nor
    -0.70
    ISA
    -0.67
    oes
    -0.66
    ophon
    -0.65
    Contents
    -0.64
    itals
    -0.64
    falls
    -0.64
    POSITIVE LOGITS
     kidding
    0.82
     yourselves
    0.81
     yourself
    0.70
     bored
    0.65
    ?!
    0.63
     drill
    0.62
     wanna
    0.61
     spoil
    0.59
     me
    0.58
     ya
    0.58
    Act Density 0.182%

    No Known Activations