INDEX
    Explanations

    People and positions of power

    New Auto-Interp
    Negative Logits
     Nietzsche
    -0.07
     groß
    -0.06
    ricula
    -0.06
     Fucking
    -0.06
    ielding
    -0.06
     صف
    -0.06
     unfit
    -0.06
    ене
    -0.06
     подіб
    -0.06
    941
    -0.06
    POSITIVE LOGITS
     )}↵
    0.07
    BER
    0.06
    ,ll
    0.06
    [obj
    0.06
     transgender
    0.06
    erland
    0.06
    :↵
    0.06
    deserialize
    0.06
     розташ
    0.06
    Monster
    0.06
    Act Density 0.072%

    No Known Activations