INDEX
    Explanations

    references to asking questions or making requests

    New Auto-Interp
    Negative Logits
     Monfieur
    -0.78
     Sodom
    -0.71
    SourceChecksum
    -0.70
     poffible
    -0.69
     noastre
    -0.69
     autorytatywna
    -0.68
     Shakspeare
    -0.67
     Efq
    -0.65
     féminine
    -0.64
     raiſ
    -0.63
    POSITIVE LOGITS
     yourself
    1.10
     you
    1.09
     your
    0.97
     You
    0.95
    你不
    0.86
    0.83
    yourself
    0.81
    你还
    0.80
     Yourself
    0.79
    0.78
    Act Density 0.141%

    No Known Activations