INDEX
    Explanations

    words related to various professions and social roles

    New Auto-Interp
    Negative Logits
    uye
    -0.17
    jiang
    -0.15
    ÅĤa
    -0.14
    rang
    -0.14
    abal
    -0.14
    ulet
    -0.14
    was
    -0.14
    ÏĦÎŃ
    -0.14
    ighton
    -0.13
     BÃł
    -0.13
    POSITIVE LOGITS
     often
    0.23
     generally
    0.23
    often
    0.19
     Generally
    0.17
     souvent
    0.17
     typically
    0.17
     usually
    0.16
    Often
    0.16
     Often
    0.16
     notoriously
    0.15
    Act Density 0.173%

    No Known Activations