INDEX
    Explanations

    terms related to punishment and its variations

    New Auto-Interp
    Negative Logits
    ¥¤
    -0.15
    گرÛĮ
    -0.15
    editary
    -0.15
    KIT
    -0.15
    hrs
    -0.15
    esk
    -0.14
    ullah
    -0.14
    füh
    -0.14
    oga
    -0.14
    outes
    -0.14
    POSITIVE LOGITS
    jabi
    0.24
    185
    0.22
    pun
    0.21
    ning
    0.19
    ks
    0.18
    ishments
    0.17
     pun
    0.16
    erli
    0.16
    sters
    0.15
    ishment
    0.15
    Act Density 0.011%

    No Known Activations