INDEX
    Explanations

    words related to negative events, disasters, and challenges

    New Auto-Interp
    Negative Logits
    sonder
    -0.60
    hever
    -0.55
     pfe
    -0.54
    elee
    -0.54
     wille
    -0.52
     individuel
    -0.52
    tothe
    -0.50
    heyd
    -0.50
     Neub
    -0.50
    esss
    -0.49
    POSITIVE LOGITS
    0.66
     tetrach
    0.65
     popoli
    0.65
     kasama
    0.64
    ffilm
    0.63
     affatto
    0.62
     venuto
    0.62
    <bos>
    0.61
     kanya
    0.61
     papà
    0.59
    Act Density 0.390%

    No Known Activations