INDEX
    Explanations

    phrases related to negative actions and behaviors towards others

    conjunctions and phrases indicating an ongoing relationship or connection between ideas

    New Auto-Interp
    Negative Logits
     Yao
    -0.71
     Shrine
    -0.64
     Animation
    -0.63
     Romans
    -0.62
     Goo
    -0.62
     Colts
    -0.62
     NRL
    -0.61
     Sox
    -0.60
     Dungeons
    -0.59
     Shots
    -0.59
    POSITIVE LOGITS
    rogen
    1.06
    rogens
    1.04
     distribute
    0.85
     punish
    0.83
     rehabilit
    0.83
     humili
    0.80
     analyse
    0.79
    rew
    0.79
     manipulate
    0.78
     dispose
    0.77
    Act Density 0.131%

    No Known Activations