INDEX
    Explanations

    instances of direct speech or dialogue

    New Auto-Interp
    Negative Logits
    [@BOS@]
    -0.83
    <unused23>
    -0.83
    <unused8>
    -0.82
    <unused14>
    -0.82
    <unused51>
    -0.82
    <unused68>
    -0.82
    <unused47>
    -0.82
    <unused42>
    -0.82
    <unused28>
    -0.82
    <unused41>
    -0.82
    POSITIVE LOGITS
    '
    0.35
    Is
    0.31
    What
    0.31
    1
    0.30
    2
    0.28
    I
    0.28
    SP
    0.27
    You
    0.27
    0.27
    If
    0.27
    Act Density 0.023%

    No Known Activations