INDEX
    Explanations

    positive information or highlights in text

    phrases that convey positive news or highlights about various topics

    New Auto-Interp
    Negative Logits
    ivalent
    -0.74
    indust
    -0.70
    adult
    -0.70
    heit
    -0.70
    20439
    -0.70
    igmatic
    -0.68
    ancies
    -0.67
    amental
    -0.66
    throp
    -0.66
    urch
    -0.64
    POSITIVE LOGITS
     bonus
    0.71
     avoids
    0.67
     cures
    0.66
     additions
    0.66
     luckily
    0.65
    :]
    0.64
     rewards
    0.64
     overlooking
    0.63
     cushion
    0.63
     Bonus
    0.62
    Act Density 0.166%

    No Known Activations