INDEX
    Explanations

    references to specific groups or categories of individuals

    New Auto-Interp
    Negative Logits
     entire
    -0.18
    etta
    -0.15
    Když
    -0.15
    æķ´ä¸ª
    -0.15
    .rdf
    -0.15
     whole
    -0.14
    ÑĪин
    -0.14
    åijĬ
    -0.14
    cestor
    -0.14
    [++
    -0.13
    POSITIVE LOGITS
     only
    0.22
     Only
    0.18
    only
    0.17
     ones
    0.17
     NONE
    0.16
    åıªæľī
    0.16
    NONE
    0.15
     none
    0.15
     ONLY
    0.15
    Only
    0.14
    Act Density 0.050%

    No Known Activations