INDEX
    Explanations

    phrases related to human rights issues

    New Auto-Interp
    Negative Logits
    urations
    -0.69
    RET
    -0.68
    eryl
    -0.68
    forth
    -0.68
     Transcript
    -0.67
    kick
    -0.67
    href
    -0.66
    onne
    -0.65
    RAG
    -0.65
    creen
    -0.64
    POSITIVE LOGITS
     beings
    1.42
    itarian
    1.24
    itar
    1.19
    oids
    1.09
    itary
    0.99
    istic
    0.97
     embryonic
    0.96
     rights
    0.95
    zee
    0.94
    izing
    0.93
    Act Density 0.378%

    No Known Activations