INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    なお
    -0.08
    yt
    -0.08
    w
    -0.08
     Chapel
    -0.08
     vowed
    -0.08
    ındaki
    -0.07
    xfd
    -0.07
    imine
    -0.07
    isting
    -0.07
    curity
    -0.07
    POSITIVE LOGITS
     yes
    0.15
     Yes
    0.14
    Yes
    0.13
     Oui
    0.11
    yes
    0.11
     Indeed
    0.10
    हाँ
    0.10
     yep
    0.10
     yeah
    0.10
    .Yes
    0.09
    Act Density 0.069%

    No Known Activations