INDEX
    Explanations

    phrases or sentences asking for information or decisions

    questions or phrases beginning with "what."

    New Auto-Interp
    Negative Logits
    ulic
    -0.75
    enburg
    -0.71
    ubs
    -0.69
    robe
    -0.69
    ped
    -0.67
    ster
    -0.67
    enberg
    -0.65
    eah
    -0.64
    aches
    -0.63
    trop
    -0.63
    POSITIVE LOGITS
     happened
    1.23
     happens
    1.14
    soever
    1.14
     kinds
    1.07
     sorts
    1.07
     happ
    1.06
     transpired
    1.04
     else
    0.93
     exactly
    0.90
     constitutes
    0.88
    Act Density 0.104%

    No Known Activations