INDEX
    Explanations

    positive adjectives or descriptors

    New Auto-Interp
    Negative Logits
    gia
    -0.16
    weg
    -0.15
    gang
    -0.15
    ãĤ¤ãĥ«
    -0.15
    #w
    -0.15
    oux
    -0.15
    jist
    -0.14
    cy
    -0.14
    inden
    -0.14
    essel
    -0.14
    POSITIVE LOGITS
     Rig
    0.16
    hk
    0.16
     reput
    0.15
    Propagation
    0.14
    acre
    0.14
    947
    0.14
     Unblock
    0.13
     ÑģпÑĢоÑģ
    0.13
    Ùħع
    0.13
    .typ
    0.13
    Act Density 0.031%

    No Known Activations