INDEX
    Explanations

    acronyms and abbreviations related to human rights and organizations

    New Auto-Interp
    Head Attr Weights
    0:0.03
    1:0.02
    2:0.06
    3:0.05
    4:0.04
    5:0.05
    6:0.42
    7:0.04
    8:0.05
    9:0.06
    10:0.07
    11:0.04
    Negative Logits
    uania
    -1.58
    verages
    -1.41
     qualify
    -1.32
    inav
    -1.30
    gpu
    -1.29
    minecraft
    -1.27
    yden
    -1.22
     nuance
    -1.21
     boycott
    -1.17
     predict
    -1.15
    POSITIVE LOGITS
     Bib
    1.50
    Û
    1.50
     Railway
    1.41
    nown
    1.38
    aroo
    1.31
    apple
    1.30
    aneers
    1.29
    hens
    1.27
    models
    1.24
    arie
    1.24
    Act Density 0.001%

    No Known Activations