INDEX
    Explanations

    interrogative words or phrases related to questions

    New Auto-Interp
    Negative Logits
    ContentAlignment
    -0.15
    sto
    -0.14
     happens
    -0.14
     иÑģÑĤ
    -0.14
    æ©
    -0.14
     fid
    -0.13
    acho
    -0.13
    yle
    -0.13
     wor
    -0.13
     hypoth
    -0.13
    POSITIVE LOGITS
     advice
    0.20
     drew
    0.20
     made
    0.19
     do
    0.18
     Advice
    0.18
     Draws
    0.18
     brought
    0.16
     draws
    0.16
     appealed
    0.16
    Advice
    0.16
    Act Density 0.040%

    No Known Activations