INDEX
    Explanations

    elements related to political discourse and issues

    New Auto-Interp
    Negative Logits
    .",
    
    -0.94
    /−
    -0.92
    .";
    
    -0.87
    -0.87
    ."),
    -0.84
    "),
    
    -0.82
    )•
    -0.82
    )−
    -0.81
    ?—
    -0.79
     (−
    -0.78
    POSITIVE LOGITS
    !!
    1.52
    ''
    1.27
    **
    1.24
    !!!
    1.19
     [[
    1.19
    ??
    1.18
     ''
    1.17
    ***
    1.17
     !!
    1.14
    '''
    1.12
    Act Density 0.822%

    No Known Activations