INDEX
    Explanations

    phrases indicating positive qualities or actions

    instances of the word "good."

    New Auto-Interp
    Negative Logits
    eds
    -0.79
    idon
    -0.76
    eters
    -0.72
    anwhile
    -0.72
    ifles
    -0.71
    chan
    -0.67
    lees
    -0.66
    hani
    -0.66
    arthed
    -0.66
    osures
    -0.66
    POSITIVE LOGITS
     chunk
    1.15
    enough
    1.13
    sword
    0.99
     approximation
    0.99
     enough
    0.94
     deal
    0.92
     ol
    0.91
     idea
    0.88
     amount
    0.87
     example
    0.86
    Act Density 0.068%

    No Known Activations