INDEX
    Explanations

    instances of confrontation or arguments

    New Auto-Interp
    Negative Logits
    cx
    -0.15
    infeld
    -0.14
    -0.14
    ensch
    -0.14
     "$
    -0.14
     “[
    -0.14
    isay
    -0.13
    ertz
    -0.13
     Fol
    -0.13
     stere
    -0.13
    POSITIVE LOGITS
    978
    0.18
    alian
    0.16
    your
    0.16
    .""
    0.15
     uh
    0.14
     yourselves
    0.14
    MOTE
    0.14
     your
    0.14
     yourself
    0.14
     obsess
    0.13
    Act Density 1.791%

    No Known Activations