INDEX
    Explanations

    words related to authority figures or positions of power

    references to the article "the" in various contexts

    New Auto-Interp
    Negative Logits
    SPONSORED
    -0.88
    VICE
    -0.72
    wine
    -0.68
    heses
    -0.67
    Course
    -0.67
    JUST
    -0.66
     SetFontSize
    -0.66
    pps
    -0.66
    estyles
    -0.65
    owned
    -0.64
    POSITIVE LOGITS
     proverbial
    0.92
     notion
    0.87
     edges
    0.83
     weeds
    0.83
     heels
    0.83
     rug
    0.83
     unsuspecting
    0.82
     slightest
    0.81
     offending
    0.81
     enorm
    0.81
    Act Density 0.470%

    No Known Activations