INDEX
    Explanations

    the presence of specific words related to violent events and actions

    New Auto-Interp
    Negative Logits
     soto
    -0.63
     plis
    -0.63
     ekos
    -0.60
     muna
    -0.59
     encre
    -0.59
     stopp
    -0.57
     cabrio
    -0.57
     habang
    -0.56
     Italijani
    -0.56
     pecuni
    -0.56
    POSITIVE LOGITS
     écout
    0.76
     fameux
    0.65
     découv
    0.61
     évit
    0.61
     curieux
    0.60
     parlant
    0.59
     conçus
    0.59
     offrant
    0.58
     réal
    0.58
     rassemb
    0.58
    Act Density 0.337%

    No Known Activations