INDEX
    Explanations

    is followed by affirmation

    New Auto-Interp
    Negative Logits
     horrible
    0.87
     stupid
    0.86
     crappy
    0.85
     messed
    0.84
     evil
    0.81
     crazy
    0.80
     generalizes
    0.77
     ruining
    0.76
     newbies
    0.74
     destroys
    0.74
    POSITIVE LOGITS
     remarkably
    1.35
     undoubtedly
    1.31
     undeniably
    1.30
     strikingly
    1.15
     unquestionably
    1.14
     decidedly
    1.09
     unmistak
    1.06
     markedly
    1.05
     eminently
    1.03
     testament
    1.03
    Act Density 0.617%

    No Known Activations