INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ofire
    -0.07
    ouis
    -0.07
    ido
    -0.06
    atha
    -0.06
    ISO
    -0.06
    OTE
    -0.06
    wel
    -0.06
    XB
    -0.06
     Hir
    -0.06
    aec
    -0.06
    POSITIVE LOGITS
     Man
    0.17
    Man
    0.15
     MAN
    0.13
    .Man
    0.12
     man
    0.12
    (man
    0.11
    _man
    0.11
    -Man
    0.11
    _Man
    0.10
    -man
    0.10
    Act Density 0.020%

    No Known Activations