INDEX
    Explanations

    phrases related to human rights violations and political events

    phrases related to social issues and minority groups

    New Auto-Interp
    Negative Logits
     *)
    -0.70
    !)
    -0.62
     hindsight
    -0.57
    )!
    -0.56
    autions
    -0.56
     VIDEOS
    -0.56
    )</
    -0.54
    -)
    -0.52
     ?)
    -0.51
     broch
    -0.50
    POSITIVE LOGITS
    ãĢĤ
    0.62
    .",
    0.61
    èĢ
    0.61
    atever
    0.59
     ".
    0.58
     TAMADRA
    0.57
    ".
    0.55
    hene
    0.55
     respectively
    0.54
    ',"
    0.54
    Act Density 1.771%

    No Known Activations