INDEX
    Explanations

    phrases indicating the beginning or initiation of a process or action

    instances of the word "started."

    New Auto-Interp
    Negative Logits
    obi
    -0.81
    âĨij
    -0.76
     entirety
    -0.67
    omb
    -0.67
    ugs
    -0.66
    warts
    -0.64
    cit
    -0.63
    airy
    -0.63
    cedented
    -0.63
    cation
    -0.62
    POSITIVE LOGITS
     anew
    1.12
     noticing
    0.92
     experimenting
    0.89
     behaving
    0.84
     bothering
    0.82
     researching
    0.82
     accumulating
    0.80
     dating
    0.80
     acting
    0.76
     deleting
    0.75
    Act Density 0.068%

    No Known Activations