INDEX
    Explanations

    phrases or sentences indicating potential consequences or outcomes

    phrases that indicate causation or consequences

    New Auto-Interp
    Negative Logits
    Fram
    -0.59
    iling
    -0.59
    pload
    -0.59
    iddler
    -0.59
    atching
    -0.58
    ighth
    -0.58
     Sunshine
    -0.57
    terday
    -0.56
    afort
    -0.56
    schild
    -0.56
    POSITIVE LOGITS
    gers
    0.90
    wcs
    0.84
    uez
    0.78
    ging
    0.76
    -+
    0.74
    iments
    0.73
    ges
    0.71
     inex
    0.71
    better
    0.71
    stones
    0.70
    Act Density 0.039%

    No Known Activations