INDEX
    Explanations

    expressions of humor or lightheartedness

    New Auto-Interp
    Negative Logits
    )";
    
    -0.80
    -0.77
    "],
    
    -0.75
    -0.72
    "},
    
    -0.71
    .",
    
    -0.70
    '>
    
    -0.70
    Revenir
    -0.69
    "),
    
    -0.68
    )");
    
    -0.67
    POSITIVE LOGITS
     :)
    0.93
     :).
    0.76
     :-)
    0.76
     ;)
    0.70
     🙂
    0.69
    Vaya
    0.68
     phenotypes
    0.68
    :)
    0.65
     :))
    0.64
     ;-)
    0.64
    Act Density 0.080%

    No Known Activations