INDEX
    Explanations

    prepositions

    New Auto-Interp
    Negative Logits
     מבח
    -0.08
    ogado
    -0.08
    spartner
    -0.08
     partitions
    -0.08
     princess
    -0.07
     partition
    -0.07
     שלכם
    -0.07
     Light
    -0.07
    itatem
    -0.07
    -0.07
    POSITIVE LOGITS
     abuses
    0.11
     bullying
    0.11
     crimes
    0.10
     intimidation
    0.10
     abuse
    0.10
     kidnapping
    0.09
    侵犯
    0.09
     terrorism
    0.09
    犯罪
    0.09
     vandal
    0.09
    Act Density 0.036%

    No Known Activations