INDEX
    Explanations

    terms related to propaganda and recruitment activities

    New Auto-Interp
    Negative Logits
    arget
    -0.17
    eman
    -0.15
    errar
    -0.15
    eken
    -0.15
    βο
    -0.14
    ernaut
    -0.14
    aket
    -0.14
    βολ
    -0.14
     comput
    -0.13
    ematik
    -0.13
    POSITIVE LOGITS
    incer
    0.16
    ytic
    0.15
    _AI
    0.15
    illis
    0.15
    оÑĢÑĭ
    0.14
    æĭ©
    0.14
    ijk
    0.14
    zcze
    0.14
    iffs
    0.14
    yat
    0.14
    Act Density 0.036%

    No Known Activations