INDEX
    Explanations

    conversational phrases and comments directed at the audience

    New Auto-Interp
    Negative Logits
     Woman
    -0.15
    woman
    -0.15
    vet
    -0.14
     newcomer
    -0.14
    zel
    -0.14
     Ihrer
    -0.14
     fucking
    -0.14
    man
    -0.13
    cox
    -0.13
    ponent
    -0.13
    POSITIVE LOGITS
     folks
    0.51
     guys
    0.42
    fol
    0.39
     Fol
    0.38
     ladies
    0.33
     everybody
    0.32
     everyone
    0.32
     folk
    0.30
     friends
    0.30
     Guys
    0.28
    Act Density 0.155%

    No Known Activations