INDEX
    Explanations

    references to actions involving conflict or confrontation

    references to violent incidents and their implications

    New Auto-Interp
    Negative Logits
     his
    -0.61
    his
    -0.60
     their
    -0.55
     HIS
    -0.55
     inexper
    -0.52
     intended
    -0.52
     herself
    -0.52
     thy
    -0.51
     lest
    -0.51
    illary
    -0.51
    POSITIVE LOGITS
    ãĤ¼ãĤ¦ãĤ¹
    0.70
    Ñı
    0.66
    Lots
    0.65
    æĺ¯
    0.64
    Category
    0.59
    "}],"
    0.59
    ãĥ´ãĤ¡
    0.57
    \/\/
    0.57
    Ò
    0.56
    ocaly
    0.56
    Act Density 0.829%

    No Known Activations