INDEX
    Explanations

    concepts related to harm and injury

    New Auto-Interp
    Negative Logits
    addock
    -0.15
    Äįi
    -0.15
    AGMA
    -0.15
    abbo
    -0.15
    ضÙĦ
    -0.14
    تز
    -0.14
    appy
    -0.14
    Ùĥر
    -0.14
    elpers
    -0.13
    ourg
    -0.13
    POSITIVE LOGITS
     wake
    0.34
    wake
    0.30
     trail
    0.27
     Wake
    0.27
    Wake
    0.25
     wakes
    0.25
     Trail
    0.22
    Trail
    0.22
    trail
    0.20
     path
    0.20
    Act Density 0.080%

    No Known Activations