INDEX
    Explanations

    references to human-centric concepts and rights

    New Auto-Interp
    Negative Logits
     aniversario
    -0.73
     Trotz
    -0.68
    ValueStyle
    -0.68
     Diwali
    -0.67
    łość
    -0.65
     aikana
    -0.64
     productivo
    -0.64
     isles
    -0.63
     Nomenclature
    -0.62
    kulum
    -0.62
    POSITIVE LOGITS
     human
    2.47
    human
    2.21
     Human
    2.19
     HUMAN
    2.17
    Human
    2.15
    HUMAN
    2.06
     humans
    1.93
     Humans
    1.74
     humano
    1.72
     humanos
    1.69
    Act Density 0.084%

    No Known Activations