INDEX
    Explanations

    interrogative phrases or questions

    New Auto-Interp
    Negative Logits
     they
    -0.17
    they
    -0.15
    kop
    -0.15
    ÑģÑĤÑĭ
    -0.15
    itsu
    -0.14
    escription
    -0.14
    atile
    -0.14
     it
    -0.14
     wor
    -0.13
    eso
    -0.13
    POSITIVE LOGITS
     do
    0.29
     did
    0.22
     does
    0.22
     are
    0.21
    æĺ¯æĪij
    0.20
    do
    0.18
    .do
    0.18
     about
    0.17
    did
    0.17
     Does
    0.17
    Act Density 0.060%

    No Known Activations