INDEX
    Explanations

    adverbs of certainty or manner

    New Auto-Interp
    Negative Logits
     ruining
    0.51
     ruins
    0.50
     hates
    0.49
     screwed
    0.49
     ruined
    0.48
     hurting
    0.47
     messed
    0.47
     messing
    0.47
     suka
    0.46
     usan
    0.46
    POSITIVE LOGITS
     undeniably
    0.54
     inevitably
    0.54
     arguably
    0.54
     invariably
    0.51
    argu
    0.47
     undoubtedly
    0.46
     ultimately
    0.46
     inadvertently
    0.45
     inescap
    0.45
     subtly
    0.43
    Act Density 0.265%

    No Known Activations