INDEX
    Explanations

    expressions of disbelief or sarcasm

    New Auto-Interp
    Negative Logits
    erdings
    -0.58
     algemeen
    -0.57
    MLLoader
    -0.57
    ")){
    
    -0.57
    !")
    
    -0.57
    IntoConstraints
    -0.56
    importanza
    -0.56
     möjlighet
    -0.56
    Personensuche
    -0.55
    setHorizontal
    -0.54
    POSITIVE LOGITS
     why
    0.96
     Shame
    0.93
     Why
    0.89
     shame
    0.88
    Why
    0.83
    Shame
    0.82
    why
    0.81
     pathetic
    0.80
    shame
    0.78
     shameful
    0.76
    Act Density 0.334%

    No Known Activations