INDEX
    Explanations

    words related to destruction or strong actions

    the end of sentences or paragraphs in the text

    New Auto-Interp
    Negative Logits
     Niet
    -0.69
     Frie
    -0.61
     Vaugh
    -0.59
     Pru
    -0.52
    enegger
    -0.52
     Leilan
    -0.52
     Berm
    -0.52
     Moroc
    -0.50
     undermin
    -0.50
     corrid
    -0.50
    POSITIVE LOGITS
    \":
    0.60
    imum
    0.49
    ciples
    0.48
    phrine
    0.48
    cised
    0.47
    pedia
    0.47
    DragonMagazine
    0.47
    clinton
    0.47
    pret
    0.46
    âĦ¢:
    0.46
    Act Density 0.390%

    No Known Activations