INDEX
    Explanations

    the phrase "no idea"

    expressions of uncertainty or lack of understanding

    New Auto-Interp
    Negative Logits
    Reviewed
    -0.67
    wagen
    -0.60
    die
    -0.60
    jan
    -0.59
    Pers
    -0.59
    king
    -0.59
    ocally
    -0.59
    ouri
    -0.58
    istent
    -0.58
    ãĥĪ
    -0.56
    POSITIVE LOGITS
     why
    1.30
     how
    1.19
     WHY
    1.10
    why
    1.09
     whats
    1.06
     HOW
    1.02
     what
    0.99
     whence
    0.96
     whatsoever
    0.94
    how
    0.91
    Act Density 0.051%

    No Known Activations