INDEX
    Explanations

    instances of groups and organizations, especially in the context of societal issues

    New Auto-Interp
    Negative Logits
    avra
    -0.16
    idor
    -0.15
    aign
    -0.14
    izza
    -0.14
     something
    -0.14
     pic
    -0.14
    ft
    -0.14
    é¬
    -0.14
    yor
    -0.14
    agues
    -0.13
    POSITIVE LOGITS
     whose
    0.24
     which
    0.21
     that
    0.21
     mÃł
    0.20
    that
    0.20
    which
    0.20
     коÑĤоÑĢÑĭе
    0.20
    whose
    0.19
     коÑĤоÑĢаÑı
    0.18
     коÑĤоÑĢÑĭй
    0.18
    Act Density 0.142%

    No Known Activations