INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.07
    有人
    -0.07
     büyük
    -0.07
    .col
    -0.06
     người
    -0.06
    inge
    -0.06
    ,col
    -0.06
    Delete
    -0.06
    IColor
    -0.06
    -0.06
    POSITIVE LOGITS
     refrain
    0.12
     refr
    0.12
    _AURA
    0.07
     Refer
    0.07
     Kara
    0.07
     відмов
    0.07
     remin
    0.06
    imoto
    0.06
    conversation
    0.06
     내가
    0.06
    Act Density 0.002%

    No Known Activations