INDEX
    Explanations

    phrases related to social and political issues, particularly regarding human rights and freedom of expression

    New Auto-Interp
    Negative Logits
    tribute
    -0.14
    jedn
    -0.14
    unami
    -0.13
    avras
    -0.13
    ibu
    -0.13
    731
    -0.13
    inem
    -0.13
    piece
    -0.13
    isci
    -0.13
    zos
    -0.13
    POSITIVE LOGITS
    Ðĭ
    0.14
    525
    0.14
    AutoSize
    0.14
    545
    0.13
     stabil
    0.13
    oyal
    0.13
    ÏĦÏį
    0.13
     narr
    0.12
    è¼Ķ
    0.12
    LLU
    0.12
    Act Density 0.224%

    No Known Activations