INDEX
    Explanations

    mentions of how people are treated by others

    references to the concept of treatment or being treated in various contexts

    New Auto-Interp
    Negative Logits
    aer
    -0.72
    azi
    -0.71
    audi
    -0.68
    sky
    -0.63
    direction
    -0.61
    sign
    -0.60
     Origin
    -0.60
    adra
    -0.59
     Rae
    -0.58
    activated
    -0.58
    POSITIVE LOGITS
    ttes
    0.87
    ricular
    0.83
     treated
    0.79
    reatment
    0.78
    terson
    0.76
    iments
    0.75
    ivated
    0.75
    illance
    0.74
     pione
    0.73
    htaking
    0.72
    Act Density 0.019%

    No Known Activations