INDEX
    Explanations

    phrases indicating objectives or actions to be completed

    New Auto-Interp
    Negative Logits
    ладÑĥ
    -0.15
    vem
    -0.14
     Gabriel
    -0.14
    или
    -0.14
    upertino
    -0.14
    iol
    -0.14
    озв
    -0.13
    ิà¸Ļà¸Ķ
    -0.13
     err
    -0.13
     beg
    -0.13
    POSITIVE LOGITS
    istrovstvÃŃ
    0.17
    ieder
    0.15
    ÑĤÑİ
    0.15
    unch
    0.14
     Starr
    0.14
    lix
    0.14
    trag
    0.14
    AGMA
    0.13
    ickers
    0.13
    368
    0.13
    Act Density 0.110%

    No Known Activations