INDEX
    Explanations

    words related to causing harm or injury

    New Auto-Interp
    Negative Logits
    -0.43
     koke
    -0.40
    -0.40
    zelt
    -0.39
     decision
    -0.39
     lengu
    -0.39
    usias
    -0.38
     Forscher
    -0.38
     Erben
    -0.38
     negó
    -0.38
    POSITIVE LOGITS
     causing
    0.83
    causing
    0.77
     inflicting
    0.66
     harm
    0.60
     causando
    0.60
     damage
    0.60
    addGap
    0.59
     gây
    0.57
    addPreferredGap
    0.56
     Damage
    0.54
    Act Density 0.022%

    No Known Activations