INDEX
    Explanations

    questions that begin with "Did" or "did" followed by a pronoun

    New Auto-Interp
    Negative Logits
    sto
    -0.16
    ÏĦÏī
    -0.16
    izer
    -0.15
    indre
    -0.15
    ozy
    -0.14
     quali
    -0.14
    illin
    -0.14
    _HC
    -0.14
    idor
    -0.13
    ähr
    -0.13
    POSITIVE LOGITS
    bah
    0.14
    ijken
    0.14
    ponge
    0.14
    ijo
    0.14
    raise
    0.14
    afort
    0.14
    allery
    0.14
     Bij
    0.13
    flen
    0.13
    UTE
    0.13
    Act Density 0.041%

    No Known Activations