INDEX
    Explanations

    affirmative phrases or statements expressing satisfaction

    New Auto-Interp
    Negative Logits
     Tato
    -0.17
    uki
    -0.17
    lector
    -0.16
    obe
    -0.14
    arov
    -0.14
     itself
    -0.14
    erable
    -0.14
    оÑĩки
    -0.14
    olt
    -0.14
    ours
    -0.14
    POSITIVE LOGITS
     sure
    0.15
    yll
    0.15
    edy
    0.15
     Proud
    0.15
     currently
    0.15
    apr
    0.14
    edImage
    0.14
    /Dk
    0.14
    usz
    0.14
    edo
    0.14
    Act Density 0.073%

    No Known Activations