INDEX
    Explanations

    words related to cooking and food preparation

    actions taking place in specific scenarios

    New Auto-Interp
    Negative Logits
    Fact
    -0.76
     Osw
    -0.70
    clinton
    -0.69
    paralle
    -0.69
    avier
    -0.69
    ormon
    -0.68
    arton
    -0.68
    particularly
    -0.68
    arij
    -0.68
     Correct
    -0.67
    POSITIVE LOGITS
     oblivious
    1.02
     hordes
    0.98
     screaming
    0.98
     unsuspecting
    0.95
     endless
    0.94
     endlessly
    0.90
     goddamn
    0.90
     drunken
    0.89
     waving
    0.89
     dudes
    0.89
    Act Density 0.771%

    No Known Activations