INDEX
    Explanations

    phrases related to social issues or commentary

    instances of the word "the."

    New Auto-Interp
    Negative Logits
    thood
    -0.72
    eatures
    -0.69
     Alternatively
    -0.68
    nesty
    -0.65
    OSH
    -0.64
    Site
    -0.64
     è£ıè
    -0.64
    ason
    -0.64
    aken
    -0.63
     besides
    -0.63
    POSITIVE LOGITS
     rest
    1.00
     slightest
    1.00
    ses
    0.98
     smallest
    0.94
     entirety
    0.88
     hardest
    0.87
     whole
    0.87
     brightest
    0.87
     vast
    0.86
     heaviest
    0.86
    Act Density 0.196%

    No Known Activations