INDEX
    Explanations

    words related to surprise or contradiction

    the word "actually" to emphasize certainty or reality

    New Auto-Interp
    Negative Logits
    lain
    -0.79
    wich
    -0.79
    cit
    -0.76
    fu
    -0.68
    bye
    -0.68
    heid
    -0.66
     newsletters
    -0.66
    illed
    -0.65
    tailed
    -0.65
    nan
    -0.61
    POSITIVE LOGITS
     metic
    0.79
    netflix
    0.77
     comprom
    0.76
     bothering
    0.74
     meant
    0.73
    ional
    0.72
     WRITE
    0.71
    okia
    0.70
    amn
    0.70
     REALLY
    0.69
    Act Density 0.026%

    No Known Activations