INDEX
    Explanations

    references to universal concepts and human rights

    New Auto-Interp
    Negative Logits
    ãĤ¥
    -0.17
    yonel
    -0.16
    oret
    -0.15
    аннÑı
    -0.15
    iw
    -0.15
    esar
    -0.15
    ONO
    -0.14
    šli
    -0.14
    eden
    -0.14
    кеÑĤ
    -0.14
    POSITIVE LOGITS
     Universal
    0.21
    /global
    0.19
     universal
    0.19
    ist
    0.18
    Universal
    0.18
    adel
    0.18
    iversal
    0.18
    istic
    0.17
     UNIVERS
    0.17
     Studios
    0.17
    Act Density 0.011%

    No Known Activations