INDEX
    Explanations

    question phrases starting with "what"

    New Auto-Interp
    Negative Logits
    ulic
    -0.82
    opian
    -0.66
    idon
    -0.65
    emp
    -0.65
    using
    -0.65
    DEF
    -0.64
    por
    -0.63
    ushima
    -0.63
    uttering
    -0.62
    iege
    -0.62
    POSITIVE LOGITS
     happens
    1.41
     happened
    1.31
     constitutes
    1.24
     kinds
    1.23
     kind
    1.16
     else
    1.14
     transpired
    1.13
     sort
    1.09
    soever
    1.08
     percentage
    1.04
    Act Density 0.104%

    No Known Activations