INDEX
    Explanations

    touching or sensitive interactions between characters

    New Auto-Interp
    Negative Logits
     embodi
    -0.99
     scrat
    -0.95
     JOSÉ
    -0.94
     maneu
    -0.93
     impra
    -0.93
    Cringe
    -0.91
     downvote
    -0.90
     pooh
    -0.90
     guarante
    -0.90
    Lmfao
    -0.89
    POSITIVE LOGITS
    Then
    0.83
    <bos>
    0.73
    After
    0.70
    The
    0.70
    But
    0.70
    And
    0.69
    This
    0.68
    Everyone
    0.67
    Finally
    0.66
    When
    0.65
    Act Density 0.174%

    No Known Activations