INDEX
    Explanations

    moments of high emotional intensity or impactful statements

    New Auto-Interp
    Negative Logits
    ugi
    -0.15
    atan
    -0.14
     Salad
    -0.14
     Tarih
    -0.14
    ifornia
    -0.13
     Perl
    -0.13
     Dul
    -0.13
     Lâm
    -0.13
     Dawn
    -0.13
    ubl
    -0.13
    POSITIVE LOGITS
    rve
    0.17
    lue
    0.16
    zburg
    0.15
    eker
    0.15
    etros
    0.15
    fcn
    0.15
     scand
    0.14
    ource
    0.14
    arlar
    0.14
    asser
    0.14
    Act Density 0.061%

    No Known Activations