INDEX
    Explanations

    text fragments containing communication patterns like email responses

    instances of written correspondence or email formats

    New Auto-Interp
    Negative Logits
     adversaries
    -0.68
     principals
    -0.68
    æ©
    -0.67
     comprom
    -0.66
     marches
    -0.64
     lenders
    -0.62
     exting
    -0.62
    åĤ
    -0.62
    angering
    -0.62
     escal
    -0.61
    POSITIVE LOGITS
     Quote
    1.18
    Hi
    1.17
    Excellent
    1.16
    Quote
    1.16
    Nice
    1.13
    Hello
    1.11
    Originally
    1.11
    wow
    1.10
    nice
    1.10
    yeah
    1.08
    Act Density 0.106%

    No Known Activations