INDEX
    Explanations

    instances of emphasis or intensity in speech, such as words that suggest strong feelings or significant quantities

    New Auto-Interp
    Negative Logits
    /
    
    -0.53
    `;
    
    -0.52
    `,
    
    -0.52
    ]`
    -0.51
    "]}
    -0.50
     rest
    -0.50
    }$​
    -0.49
    __":
    -0.48
     few
    -0.48
    etti
    -0.48
    POSITIVE LOGITS
    !!!!!!
    1.02
    !!!!!
    0.97
    !!!!!!!
    0.96
    !!!
    0.95
    !!!!
    0.95
     FUCKING
    0.95
    (!)
    0.91
     freakin
    0.90
    !!!)
    0.90
    WithIOException
    0.89
    Act Density 0.226%

    No Known Activations