INDEX
    Explanations

    questions starting with "How does" or "Does."

    New Auto-Interp
    Negative Logits
    hig
    -0.73
    bis
    -0.68
    xon
    -0.68
    iken
    -0.68
    arer
    -0.65
    bsp
    -0.64
    ullivan
    -0.64
    lla
    -0.62
    iem
    -0.61
    psc
    -0.61
    POSITIVE LOGITS
    ?!
    1.15
    ?
    1.13
    ?]
    1.12
    ?),
    1.11
    ?)
    1.10
    ?!"
    1.08
    ?"
    1.04
    ?:
    1.02
    ?).
    1.02
    ?",
    1.01
    Act Density 0.421%

    No Known Activations