INDEX
    Explanations

    queries framed with the word "what."

    New Auto-Interp
    Negative Logits
    ummer
    -0.15
     thuáºŃn
    -0.15
    pad
    -0.15
    oubted
    -0.15
     anything
    -0.15
    å¤ļå°ij
    -0.15
    CC
    -0.14
    FFE
    -0.14
    suite
    -0.14
    ongs
    -0.14
    POSITIVE LOGITS
     about
    0.17
     do
    0.17
     if
    0.16
     effect
    0.15
     exactly
    0.15
     wenn
    0.14
    ird
    0.14
     choice
    0.14
     Harden
    0.14
    isine
    0.14
    Act Density 0.045%

    No Known Activations