INDEX
    Explanations

    phrases expressing concern or references to historical context and societal issues

    New Auto-Interp
    Negative Logits
    ubar
    -0.15
    ÑĤÑĢо
    -0.14
    UED
    -0.13
    CISION
    -0.13
    urd
    -0.12
     zij
    -0.12
    ulo
    -0.12
    ãĥ³ãĥģ
    -0.12
    ãģ£ãģį
    -0.12
    assen
    -0.12
    POSITIVE LOGITS
     us
    1.40
     me
    0.74
     нами
    0.67
    æĪij们
    0.65
    -us
    0.60
     nosotros
    0.59
    us
    0.59
     we
    0.58
     Us
    0.57
     nous
    0.56
    Act Density 1.568%

    No Known Activations