INDEX
    Explanations

    mentions of political topics or terms

    mentions of policy-related terms

    New Auto-Interp
    Negative Logits
    é¾įåĸļ士
    -0.88
    SAY
    -0.81
    Parts
    -0.80
    ACTED
    -0.80
    Hidden
    -0.77
    CVE
    -0.75
    ADRA
    -0.75
    TEXTURE
    -0.75
    OULD
    -0.72
    Render
    -0.72
    POSITIVE LOGITS
     pol
    1.25
    ipop
    1.09
     recip
    0.85
    ikarp
    0.81
    iton
    0.80
    atile
    0.76
    arity
    0.76
    atcher
    0.73
    igon
    0.73
     elbows
    0.72
    Act Density 0.004%

    No Known Activations