INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     man
    -2.86
    Man
    -1.98
     Man
    -1.94
    man
    -1.89
     MAN
    -1.75
    MAN
    -1.42
     mans
    -1.30
     hombre
    -1.23
     homem
    -1.19
     mann
    -1.13
    POSITIVE LOGITS
    Men
    0.59
     Men
    0.52
     Regel
    0.49
    men
    0.47
     MEN
    0.47
     of
    0.45
    0.44
    msub
    0.43
    '
    0.43
    Oby
    0.43
    Act Density 0.272%

    No Known Activations