INDEX
    Explanations

    generic statements following a pattern, potentially related to advice or guiding principles

    phrases that emphasize positive qualities or rules

    New Auto-Interp
    Negative Logits
    jri
    -0.74
    gemony
    -0.67
    cape
    -0.64
     Himself
    -0.64
    eds
    -0.63
    uthor
    -0.61
    ruption
    -0.61
    eters
    -0.61
     vanquished
    -0.60
    hyde
    -0.60
    POSITIVE LOGITS
    enough
    1.10
    reads
    1.08
     example
    1.04
     ol
    1.03
    bye
    1.03
     reason
    1.00
    luck
    0.98
     Samar
    0.95
     approximation
    0.95
     luck
    0.93
    Act Density 0.068%

    No Known Activations