INDEX
    Explanations

    criticism of political figures and institutions

    New Auto-Interp
    Negative Logits
    gow
    -0.81
    zac
    -0.76
    essee
    -0.69
    yip
    -0.69
    ovember
    -0.67
    ividual
    -0.65
    seys
    -0.65
    iewicz
    -0.63
    jri
    -0.62
    matter
    -0.62
    POSITIVE LOGITS
     collapsing
    0.64
    thumbnails
    0.62
    NVIDIA
    0.58
    Asset
    0.57
    Running
    0.57
     Rossi
    0.56
    Heart
    0.56
     Kend
    0.55
    ooters
    0.55
     intertwined
    0.55
    Act Density 0.006%

    No Known Activations