INDEX
    Explanations

    descriptions

    New Auto-Interp
    Negative Logits
    атки
    -0.07
     PARK
    -0.07
    _detach
    -0.07
     Rooney
    -0.06
    .Areas
    -0.06
     teasing
    -0.06
    _share
    -0.06
    -0.06
    ль
    -0.06
     tắt
    -0.06
    POSITIVE LOGITS
     disrupt
    0.07
     uid
    0.07
     Ethiopia
    0.06
     Hunting
    0.06
     coli
    0.06
     Cuba
    0.06
     Enterprise
    0.06
     heals
    0.06
     ',↵
    0.06
    compiler
    0.05
    Act Density 0.233%

    No Known Activations