INDEX
    Explanations

    adjectives indicating importance

    New Auto-Interp
    Negative Logits
    holder
    -0.07
    нес
    -0.07
    -0.07
     Much
    -0.07
    -girl
    -0.06
     More
    -0.06
    іл
    -0.06
     tỏ
    -0.06
    aları
    -0.06
     Winn
    -0.06
    POSITIVE LOGITS
    .
    ↵
    0.07
     /↵
    0.06
    0.06
     --------------------
    0.06
    [dir
    0.06
    )”
    0.06
    $class
    0.06
     Guinea
    0.06
    conj
    0.06
    Styled
    0.06
    Act Density 0.064%

    No Known Activations