INDEX
    Explanations

    phrases expressing understanding or comprehension

    expressions of understanding or confusion

    New Auto-Interp
    Negative Logits
    velt
    -0.65
    worthy
    -0.63
     foss
    -0.59
    erness
    -0.59
    haus
    -0.59
    agen
    -0.59
     feasibility
    -0.58
     ante
    -0.57
    yet
    -0.57
     Britann
    -0.57
    POSITIVE LOGITS
     bored
    0.98
     tired
    0.95
     annoyed
    0.93
     rid
    0.93
     distracted
    0.90
     punished
    0.86
     yelled
    0.84
     irritated
    0.83
     sucked
    0.82
     confused
    0.82
    Act Density 0.080%

    No Known Activations