INDEX
    Explanations

    words related to transitions or changes in situations

    New Auto-Interp
    Negative Logits
    ube
    -0.17
    stroy
    -0.15
    143
    -0.14
    Enumerator
    -0.14
    dn
    -0.14
    rame
    -0.14
    aleb
    -0.14
    ikit
    -0.14
     ramp
    -0.14
    lbrace
    -0.14
    POSITIVE LOGITS
    íݸ
    0.18
    olley
    0.16
    ijken
    0.15
    alink
    0.15
    arte
    0.15
     pars
    0.14
    ãĢħ
    0.14
    غÙĬر
    0.13
    aran
    0.13
    bai
    0.13
    Act Density 0.277%

    No Known Activations