INDEX
    Explanations

    being subjected to judgment or mistreatment

    New Auto-Interp
    Negative Logits
     encouraged
    0.46
     allowed
    0.45
    dn
    0.44
     granted
    0.44
     equipped
    0.41
     instilled
    0.41
     able
    0.41
     ausgestattet
    0.39
     unleashed
    0.39
     seeking
    0.39
    POSITIVE LOGITS
     manipulated
    0.57
     talked
    0.56
     photographed
    0.55
     chatted
    0.55
     dominated
    0.53
     sinned
    0.52
     disagreed
    0.52
    watched
    0.51
    dominated
    0.51
     watched
    0.50
    Act Density 0.016%

    No Known Activations