INDEX
    Explanations

    phrases indicating surprise or contradiction

    the word "actually" in various contexts

    New Auto-Interp
    Negative Logits
    lain
    -0.73
    illed
    -0.69
    fu
    -0.69
    bye
    -0.68
    wich
    -0.68
    cit
    -0.67
    oute
    -0.66
    heid
    -0.63
    legged
    -0.63
    ado
    -0.63
    POSITIVE LOGITS
     comprom
    0.88
     meant
    0.83
     bothering
    0.81
    olkien
    0.80
     REALLY
    0.77
     metic
    0.73
     bother
    0.72
     quite
    0.71
     intended
    0.68
     actually
    0.67
    Act Density 0.024%

    No Known Activations