INDEX
    Explanations

    references to specific individuals or characters in various contexts

    New Auto-Interp
    Negative Logits
     himself
    -0.18
     Himself
    -0.18
    unga
    -0.18
    opr
    -0.16
    .libs
    -0.15
    еÑĢж
    -0.14
     nÃło
    -0.14
     sám
    -0.14
    ungi
    -0.14
    nÃŃ
    -0.14
    POSITIVE LOGITS
     alike
    0.42
     respectively
    0.33
     both
    0.28
     themselves
    0.28
     BOTH
    0.25
    both
    0.25
     sowie
    0.24
     ê°ģê°ģ
    0.24
     serta
    0.23
     their
    0.23
    Act Density 0.236%

    No Known Activations