INDEX
    Explanations

    statements about representation and social issues in media

    New Auto-Interp
    Negative Logits
     muß
    -1.18
     läßt
    -1.09
     daß
    -1.08
     müßte
    -0.99
     Надо
    -0.99
     Moslem
    -0.93
    ^(@)
    -0.92
     idéia
    -0.88
    Надо
    -0.88
     Daß
    -0.87
    POSITIVE LOGITS
     Additionally
    0.81
    Alright
    0.77
    Additionally
    0.72
     incentiv
    0.71
     prioritizing
    0.70
     prioritize
    0.70
     impactful
    0.69
     aforementioned
    0.68
    createNewFile
    0.67
     newfound
    0.66
    Act Density 0.549%

    No Known Activations