INDEX
    Explanations

    words indicating similarities or comparisons

    New Auto-Interp
    Negative Logits
    essa
    -0.17
    ÑĢом
    -0.15
    redient
    -0.15
    urai
    -0.14
    esktop
    -0.14
    raki
    -0.13
    antar
    -0.13
    ilip
    -0.13
    uled
    -0.13
     jinak
    -0.13
    POSITIVE LOGITS
     nhau
    0.26
     those
    0.22
     what
    0.21
     ÑģобоÑİ
    0.19
     ours
    0.19
     unto
    0.19
     Ñģобой
    0.19
     other
    0.17
    those
    0.17
     ones
    0.17
    Act Density 0.158%

    No Known Activations