INDEX
    Explanations

    references to historical political terms and concepts

    New Auto-Interp
    Negative Logits
     ####
    -0.24
     ###
    -0.22
     #####
    -0.22
     ##
    -0.20
     ###↵
    -0.16
     '**
    -0.15
    _##
    -0.15
     **
    -0.15
     #
    -0.15
     \`
    -0.15
    POSITIVE LOGITS
    âĨij
    0.33
    ^
    0.32
    Template
    0.26
    Wik
    0.26
     ^
    0.25
     âĨij
    0.25
     Template
    0.25
     ^↵
    0.24
    :^
    0.23
    .^
    0.23
    Act Density 0.019%

    No Known Activations