INDEX
    Explanations

    references to specific groups or categories of people

    New Auto-Interp
    Negative Logits
     Suom
    -0.57
     Puig
    -0.56
     Wilber
    -0.55
     Gedichte
    -0.55
     میل
    -0.53
     Isma
    -0.51
     pytanie
    -0.50
     Lancelot
    -0.50
     Änder
    -0.49
     Jum
    -0.49
    POSITIVE LOGITS
     those
    1.31
    Those
    1.25
    those
    1.19
     Those
    1.19
     THOSE
    1.13
     these
    1.00
     pesky
    1.00
    %")
    0.94
     चीज़ों
    0.93
     Những
    0.91
    Act Density 0.031%

    No Known Activations