INDEX
    Explanations

    punctuation and formatting elements, suggesting a focus on conversational or interactive language

    New Auto-Interp
    Negative Logits
    ingham
    -0.16
    loth
    -0.14
    ardi
    -0.14
    yna
    -0.13
     gap
    -0.13
    pper
    -0.13
     Cooke
    -0.13
    irá
    -0.13
    iao
    -0.13
     adress
    -0.13
    POSITIVE LOGITS
     Episode
    0.24
     listener
    0.23
     listeners
    0.23
     tune
    0.22
     episode
    0.22
    Episode
    0.22
     Segment
    0.21
     Listener
    0.20
    Listener
    0.20
    episode
    0.20
    Act Density 0.052%

    No Known Activations