INDEX
    Explanations

    questions and expressions of curiosity or uncertainty

    New Auto-Interp
    Negative Logits
    I
    -0.16
    .pkg
    -0.14
    лки
    -0.14
    cre
    -0.13
    lod
    -0.13
    no
    -0.13
     itself
    -0.13
    WH
    -0.13
    anyl
    -0.13
    A
    -0.13
    POSITIVE LOGITS
     how
    0.45
     whether
    0.41
    how
    0.32
    whether
    0.30
     what
    0.30
     why
    0.29
     Whether
    0.28
     cómo
    0.28
    æĺ¯åIJ¦
    0.27
    Whether
    0.26
    Act Density 0.179%

    No Known Activations