INDEX
    Explanations

    concepts related to human rights

    New Auto-Interp
    Negative Logits
    HEAD
    -0.79
    URRENT
    -0.75
    -+-+-+-+
    -0.73
    -+-+
    -0.71
    BALL
    -0.70
     Plains
    -0.66
    STRUCT
    -0.66
    AST
    -0.65
    ````
    -0.64
    ALSE
    -0.64
    POSITIVE LOGITS
     rights
    1.35
    rights
    1.35
     Rights
    1.16
     abuses
    1.06
    ktop
    1.00
     protections
    0.89
    tarians
    0.86
    yright
    0.86
     equality
    0.85
     distribut
    0.85
    Act Density 0.020%

    No Known Activations