INDEX
    Explanations

    references to media outlets or individuals associated with controversial or right-wing views

    New Auto-Interp
    Negative Logits
    AI
    -0.17
     Fir
    -0.17
    ysi
    -0.15
    imer
    -0.14
     Rog
    -0.14
     Marc
    -0.14
    LR
    -0.14
     Hidden
    -0.14
     Huntington
    -0.14
    ivil
    -0.14
    POSITIVE LOGITS
    Vectorizer
    0.16
    UIT
    0.15
    AMIL
    0.15
    æģ¯
    0.15
    -Ta
    0.15
     bek
    0.14
    ationToken
    0.14
    _Params
    0.14
    .SuspendLayout
    0.14
    uits
    0.14
    Act Density 0.002%

    No Known Activations