INDEX
    Explanations

    references to individuals' political roles and affiliations

    New Auto-Interp
    Negative Logits
    ÑĤÑĢо
    -0.16
    kest
    -0.14
    visa
    -0.14
    ãĥ©ãĥĥãĤ¯
    -0.14
    -visible
    -0.14
    ights
    -0.13
     Bounty
    -0.13
     плаÑĤ
    -0.13
    ↵↵
    -0.13
    owell
    -0.13
    POSITIVE LOGITS
     Chef
    0.20
     Refer
    0.20
    stell
    0.20
     Le
    0.20
     Dire
    0.19
     Che
    0.19
     Sek
    0.18
     refer
    0.18
     chef
    0.18
     che
    0.17
    Act Density 0.019%

    No Known Activations