INDEX
    Explanations

    references to death or dying

    New Auto-Interp
    Negative Logits
    ial
    -0.17
    ipro
    -0.15
    ارÙĩ
    -0.15
    ver
    -0.15
    insky
    -0.15
    ours
    -0.15
    mi
    -0.14
    ifetime
    -0.14
    ë§Į
    -0.14
    rette
    -0.14
    POSITIVE LOGITS
    lectric
    0.21
     young
    0.18
     intest
    0.18
    elp
    0.17
    hard
    0.16
    daÅŁ
    0.16
    young
    0.15
     defending
    0.15
     Lambert
    0.15
    -hard
    0.15
    Act Density 0.025%

    No Known Activations