INDEX
    Explanations

    mentions of social issues and controversies, particularly around religion and politics

    New Auto-Interp
    Negative Logits
    ãĤ´ãĥ³
    -0.62
    ORPG
    -0.59
    olutions
    -0.57
    aukee
    -0.56
     Balanced
    -0.55
     Flavoring
    -0.55
     Defeat
    -0.55
    gat
    -0.54
    youtu
    -0.52
    ãĥ¯ãĥ³
    -0.52
    POSITIVE LOGITS
     alas
    1.11
     unsurprisingly
    1.03
     uh
    0.98
     indeed
    0.85
     albeit
    0.85
     unfortunately
    0.84
     um
    0.84
     frankly
    0.84
     admittedly
    0.84
     moreover
    0.82
    Act Density 0.066%

    No Known Activations