INDEX
    Explanations

    questions starting with the word "What" or "Who"

    New Auto-Interp
    Negative Logits
    ://
    -0.74
    ="#
    -0.68
    avor
    -0.58
     pistols
    -0.58
     Heist
    -0.55
    Mobil
    -0.55
    KO
    -0.55
    conv
    -0.55
    zzy
    -0.55
     corrid
    -0.54
    POSITIVE LOGITS
    ean
    0.79
     uh
    0.72
     somew
    0.71
     Subst
    0.67
    ersen
    0.67
    ulhu
    0.65
     um
    0.65
     besides
    0.64
    anwhile
    0.63
    isphere
    0.62
    Act Density 0.100%

    No Known Activations