INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ſelf
    -0.88
    Nuorodos
    -0.79
    脚注の使い方
    -0.78
    ſelves
    -0.76
     Winaray
    -0.76
    DockStyle
    -0.75
    herself
    -0.75
    bewerken
    -0.73
    :✨
    -0.73
     myſelf
    -0.72
    POSITIVE LOGITS
     they
    1.08
     of
    0.94
     we
    0.86
     he
    0.72
     it
    0.70
     there
    0.69
    0.66
     nobody
    0.65
    ing
    0.64
     you
    0.63
    Act Density 0.081%

    No Known Activations