INDEX
    Explanations

    mentions of the word "torture"

    New Auto-Interp
    Negative Logits
     Darkness
    -0.78
     magnification
    -0.75
     donor
    -0.68
    lihood
    -0.66
     Farn
    -0.65
     Prospect
    -0.64
     brightest
    -0.64
    âĸ¬
    -0.63
     Manhattan
    -0.63
    FORE
    -0.62
    POSITIVE LOGITS
    urous
    1.43
    oise
    1.34
    illas
    1.08
    uring
    1.03
    uous
    1.01
    ured
    1.01
    urers
    0.98
    eur
    0.96
    imer
    0.94
    ures
    0.94
    Act Density 0.005%

    No Known Activations