INDEX
    Explanations

    conceptions of complexity and dichotomy in various contexts

    New Auto-Interp
    Negative Logits
    untime
    -0.16
    ensing
    -0.15
     Tomb
    -0.15
    allo
    -0.15
     Ley
    -0.15
    ragon
    -0.14
    esto
    -0.14
    oleans
    -0.14
    tü
    -0.14
    okit
    -0.14
    POSITIVE LOGITS
     ones
    0.26
     theirs
    0.18
     íĥĦ
    0.16
    htub
    0.16
     Ones
    0.15
    çijŁ
    0.15
    vang
    0.15
    ones
    0.15
    .bz
    0.15
     ours
    0.15
    Act Density 0.144%

    No Known Activations