INDEX
    Explanations

    countries, political figures, and controversial topics or actions

    specific proper nouns and terms related to political and social issues

    New Auto-Interp
    Negative Logits
    ume
    -0.66
     Gil
    -0.66
     toile
    -0.61
     Nanto
    -0.60
     Ern
    -0.60
     Brune
    -0.58
     menstrual
    -0.57
     denomin
    -0.55
     colle
    -0.55
     backdrop
    -0.55
    POSITIVE LOGITS
    vantage
    0.79
     hadn
    0.77
     deserved
    0.75
     shouldn
    0.74
     should
    0.71
    bably
    0.70
    cheat
    0.70
     couldn
    0.69
    seless
    0.69
    helps
    0.69
    Act Density 0.612%

    No Known Activations