INDEX
    Explanations

    mentions of political figures and activities related to politics

    New Auto-Interp
    Negative Logits
     Sidd
    -1.12
    IELD
    -1.02
    UGE
    -1.01
    ADE
    -0.99
    ORGE
    -0.99
    enegger
    -0.95
     Goldberg
    -0.94
    VEL
    -0.89
    HEAD
    -0.87
     Learning
    -0.87
    POSITIVE LOGITS
    itives
    1.79
    itive
    1.72
    §
    1.54
    ĭ
    1.53
    otaur
    1.52
    eties
    1.52
    Ľ
    1.51
    İ
    1.48
    acy
    1.47
    į
    1.43
    Act Density 2.188%

    No Known Activations