INDEX
    Explanations

    repetitive phrases or articles

    New Auto-Interp
    Negative Logits
    thood
    -0.72
    iffe
    -0.72
    leeve
    -0.70
    advertising
    -0.64
    IDs
    -0.63
    -$
    -0.61
    egu
    -0.61
     because
    -0.60
    iscover
    -0.60
     suppose
    -0.58
    POSITIVE LOGITS
    ses
    1.22
     same
    1.03
     entirety
    1.00
     slightest
    0.97
     entire
    0.97
     majority
    0.95
     latter
    0.94
     longest
    0.94
     quickest
    0.94
     extent
    0.92
    Act Density 0.128%

    No Known Activations