INDEX
    Explanations

    statements related to political criticism and advocacy

    New Auto-Interp
    Negative Logits
    igy
    -0.16
    ctl
    -0.14
    WITHOUT
    -0.13
    .dy
    -0.13
    illy
    -0.13
    mez
    -0.13
    ÏĦÏģ
    -0.13
     Altern
    -0.13
     Ãĸr
    -0.13
    ofi
    -0.13
    POSITIVE LOGITS
     nor
    1.13
     Nor
    0.96
    nor
    0.88
    Nor
    0.85
     NOR
    0.65
     sondern
    0.53
     neither
    0.47
     anymore
    0.45
     بÙĦÚ©Ùĩ
    0.44
     ноÑĢ
    0.39
    Act Density 0.273%

    No Known Activations