INDEX
    Explanations

    questions prompting for knowledge or opinions

    questions that prompt an exploration of knowledge or information

    New Auto-Interp
    Negative Logits
    Ń
    -0.68
    Hum
    -0.67
    yssey
    -0.66
    cour
    -0.65
    ©¶æ¥µ
    -0.64
    Init
    -0.62
    inery
    -0.61
     backdrop
    -0.61
    Luck
    -0.60
    ï¸ı
    -0.60
    POSITIVE LOGITS
    ?'
    1.06
    ?"
    0.97
    ?:
    0.94
     yourselves
    0.94
    ?'"
    0.92
    ?
    0.91
    ?".
    0.90
    ...?
    0.87
    ?).
    0.86
    ?ãĢį
    0.85
    Act Density 0.141%

    No Known Activations