INDEX
    Explanations

    phrases relating to effects or consequences

    New Auto-Interp
    Negative Logits
    bow
    -0.76
    cele
    -0.69
    approved
    -0.64
    course
    -0.61
    mad
    -0.60
    away
    -0.60
    fter
    -0.59
    ption
    -0.58
    ilings
    -0.58
    media
    -0.58
    POSITIVE LOGITS
     tremend
    0.84
     alot
    0.79
     raining
    0.78
    bnb
    0.77
    ynthesis
    0.71
     CTR
    0.70
    ometimes
    0.69
    rontal
    0.67
    ichick
    0.63
    inently
    0.63
    Act Density 0.241%

    No Known Activations