INDEX
    Explanations

    words related to storytelling and personal anecdotes

    conversational expressions and social interactions

    New Auto-Interp
    Negative Logits
    士
    -0.81
    £ı
    -0.78
    vre
    -0.75
    unal
    -0.68
     Flavoring
    -0.67
     Perhaps
    -0.65
    Enough
    -0.64
    ufficient
    -0.64
     Updated
    -0.63
     safegu
    -0.63
    POSITIVE LOGITS
     kind
    0.99
     uh
    0.95
     kinda
    0.93
     laughing
    0.90
     ['
    0.89
     fuckin
    0.87
     saying
    0.86
     yelling
    0.85
     [
    0.85
     grinning
    0.84
    Act Density 0.414%

    No Known Activations