INDEX
    Explanations

    expressions of personal beliefs and values regarding responsibility and communication

    New Auto-Interp
    Negative Logits
     Dont
    -0.89
     doesnt
    -0.88
     didnt
    -0.86
    Dont
    -0.84
     dont
    -0.84
     fuckin
    -0.84
     DONT
    -0.80
     couldnt
    -0.79
     isnt
    -0.79
     wouldnt
    -0.78
    POSITIVE LOGITS
     >>
    0.69
     --
    0.63
     ♪
    0.59
     kommenden
    0.46
     "--
    0.43
    0.41
     vergangenen
    0.40
     ontem
    0.40
    ("--
    0.39
    quehanna
    0.38
    Act Density 0.036%

    No Known Activations