INDEX
    Explanations

    expressions of disbelief or surprise

    New Auto-Interp
    Negative Logits
    Hahahahaha
    -0.68
    Hahahaha
    -0.66
    ulipas
    -0.61
    viedo
    -0.61
     meras
    -0.60
    -0.60
     €/
    -0.59
     girasol
    -0.55
     naran
    -0.55
    })->
    -0.55
    POSITIVE LOGITS
     Oh
    0.91
     oh
    0.86
    Oh
    0.84
     prouve
    0.81
     ferait
    0.77
     scrat
    0.74
     OH
    0.72
     pooh
    0.69
     défend
    0.67
     reconno
    0.67
    Act Density 0.032%

    No Known Activations