INDEX
    Explanations

    words associated with negative consequences or outcomes

    phrases indicating causality or consequences

    New Auto-Interp
    Negative Logits
    afort
    -0.64
     Shal
    -0.64
    iling
    -0.62
    pload
    -0.62
     Vaughn
    -0.60
    terday
    -0.59
     Sunshine
    -0.58
     Scotia
    -0.58
    atu
    -0.57
     tuber
    -0.57
    POSITIVE LOGITS
    gers
    0.91
    entious
    0.88
    wcs
    0.84
    better
    0.77
    -+
    0.76
    iments
    0.74
    ges
    0.71
    ging
    0.70
    rush
    0.68
    GGGG
    0.67
    Act Density 0.031%

    No Known Activations