INDEX
    Explanations

    references to social demographics and representation

    after single letters

    New Auto-Interp
    Negative Logits
     CommandType
    -0.59
     mentre
    -0.59
     versus
    -0.57
     Versus
    -0.56
     vs
    -0.56
     Whereas
    -0.54
    不像
    -0.54
     متعلقه
    -0.53
     betweenstory
    -0.53
     Vs
    -0.53
    POSITIVE LOGITS
     itſelf
    0.69
     purpoſe
    0.65
     наоборот
    0.64
     pleaſure
    0.63
     defire
    0.63
     houſe
    0.63
     diſt
    0.63
     neceff
    0.63
     Reſ
    0.61
     ones
    0.61
    Act Density 0.772%

    No Known Activations