INDEX
    Explanations

    expressions of uncertainty or confusion

    New Auto-Interp
    Negative Logits
    ÃĸL
    -0.18
    ulumi
    -0.17
    ÃľM
    -0.16
    ħ§
    -0.15
    ÏĢÎŃ
    -0.15
    xm
    -0.15
    ÏĦÏĥι
    -0.15
    jez
    -0.15
    надлеж
    -0.15
    eless
    -0.15
    POSITIVE LOGITS
     wa
    0.33
     w
    0.29
     bow
    0.25
     ho
    0.24
     wat
    0.20
     wh
    0.20
     Bow
    0.20
    -w
    0.20
     want
    0.20
     hat
    0.19
    Act Density 0.183%

    No Known Activations