INDEX
    Explanations

    expressions of humor or laughter

    New Auto-Interp
    Negative Logits
    ?");
    -0.82
    )";
    
    -0.80
    .",
    
    -0.79
    %");
    -0.79
    ?")
    -0.78
    "),
    
    -0.76
     */;
    -0.76
    .")
    
    -0.76
    :");
    
    -0.75
    .」
    -0.72
    POSITIVE LOGITS
    <eos>
    0.74
    ↵↵
    0.64
     But
    0.62
     And
    0.61
     This
    0.60
    !
    0.59
     Especially
    0.57
     They
    0.57
     (
    0.57
     I
    0.57
    Act Density 0.145%

    No Known Activations