INDEX
    Explanations

    concepts and discussions related to crime and harm

    New Auto-Interp
    Negative Logits
    imson
    -0.16
    ovat
    -0.16
     Kok
    -0.16
     Karma
    -0.15
    vae
    -0.15
     František
    -0.14
     McCorm
    -0.14
    acha
    -0.14
    izm
    -0.14
    ิà¸ĩ
    -0.14
    POSITIVE LOGITS
     crime
    0.48
     Crime
    0.40
    crime
    0.39
     crimes
    0.38
    -cr
    0.36
    Crime
    0.35
    .cr
    0.32
     Crimes
    0.31
    _cr
    0.30
    çĬ¯ç½ª
    0.30
    Act Density 0.055%

    No Known Activations