INDEX
    Explanations

    characters and their roles, focusing on their behaviors and relationships in storytelling

    New Auto-Interp
    Negative Logits
    ilim
    -0.17
    .uni
    -0.16
    laughter
    -0.16
    arent
    -0.15
    ascar
    -0.15
     disrespect
    -0.15
    gnore
    -0.15
    -gnu
    -0.14
    iedo
    -0.14
    edback
    -0.14
    POSITIVE LOGITS
     alo
    0.19
     shl
    0.19
     cipher
    0.19
     tac
    0.18
     bro
    0.17
     clue
    0.17
     su
    0.16
    earn
    0.16
     foil
    0.16
     lik
    0.16
    Act Density 0.120%

    No Known Activations