INDEX
    Explanations

    negative sentiment and implications in the text

    New Auto-Interp
    Negative Logits
    -0.45
    -0.40
     */
    -0.39
    s
    -0.39
    ↵↵
    -0.39
     …
    -0.39
     *
    -0.39
    ...
    -0.38
    ↵↵↵
    -0.37
     \
    -0.36
    POSITIVE LOGITS
     queſta
    1.03
    <unused8>
    1.02
    [@BOS@]
    1.01
    <unused14>
    1.01
    <unused43>
    1.01
    <unused51>
    1.01
    <unused42>
    1.01
    <unused41>
    1.01
    <unused16>
    1.01
    <unused28>
    1.01
    Act Density 0.024%

    No Known Activations