INDEX
    Explanations

    words related to notable individuals and specific events, potentially from news articles or online forums

    New Auto-Interp
    Negative Logits
    <bos>
    -0.99
     ***!
    -0.74
    __;
    -0.73
    RectangleBorder
    -0.72
    HtmlAttribute
    -0.70
    .
    -0.69
     >=",
    -0.69
     <",
    -0.67
    ;#
    -0.65
    ;
    -0.64
    POSITIVE LOGITS
     impra
    2.16
     increa
    2.12
     disagre
    2.07
     maneu
    2.03
     affor
    2.00
     emphat
    1.98
     reluct
    1.94
     unspeak
    1.93
     unden
    1.93
     fuf
    1.93
    Act Density 4.557%

    No Known Activations