INDEX
    Explanations

    phrases that indicate responsibility or accountability

    New Auto-Interp
    Negative Logits
     unbelievably
    -0.95
     fucking
    -0.94
     insanely
    -0.93
     goddamn
    -0.93
    fucking
    -0.90
     absolutely
    -0.90
     utterly
    -0.89
     FUCKING
    -0.89
    absolutely
    -0.87
     EVERY
    -0.85
    POSITIVE LOGITS
     perhaps
    1.21
    perhaps
    1.10
     Perhaps
    1.02
     somewhat
    0.97
    Perhaps
    0.95
     maybe
    0.95
    <bos>
    0.92
    anskje
    0.91
     Somewhat
    0.89
     vielleicht
    0.88
    Act Density 0.862%

    No Known Activations