INDEX
    Explanations

    differences or contradictions in statements

    phrases related to conversation and communication dynamics

    New Auto-Interp
    Negative Logits
    culus
    -0.98
    atri
    -0.82
    cephal
    -0.76
    ahime
    -0.76
    ggle
    -0.74
    taboola
    -0.74
    anus
    -0.72
    ãĥī
    -0.71
    tnc
    -0.71
    ummer
    -0.68
    POSITIVE LOGITS
     THEN
    0.97
     preferably
    0.93
     then
    0.89
     verbally
    0.87
     paraph
    0.86
     retweet
    0.80
     phrases
    0.79
     concise
    0.79
     criticize
    0.79
     orally
    0.78
    Act Density 0.625%

    No Known Activations