INDEX
    Explanations

    questions and phrases indicating inquiry or curiosity

    New Auto-Interp
    Negative Logits
     what
    -0.19
    .what
    -0.15
     frank
    -0.15
    what
    -0.14
    476
    -0.14
    zb
    -0.14
    ator
    -0.14
    ovah
    -0.13
    uz
    -0.13
    åı
    -0.13
    POSITIVE LOGITS
    -count
    0.17
    onec
    0.15
    iyat
    0.15
    lesson
    0.14
     place
    0.14
    /place
    0.14
     motiv
    0.14
     meaning
    0.14
    ä¸ģ
    0.14
     aile
    0.14
    Act Density 0.109%

    No Known Activations