INDEX
    Explanations

    phrases related to initiating or beginning actions

    instances of the word 'start' and its variations

    New Auto-Interp
    Negative Logits
     entirety
    -0.74
    obi
    -0.69
    illard
    -0.66
    phy
    -0.65
    ighth
    -0.64
    itsch
    -0.64
     wrought
    -0.62
    pedia
    -0.62
    ocene
    -0.60
    acho
    -0.59
    POSITIVE LOGITS
     anew
    1.03
    nings
    0.85
    starting
    0.79
     behaving
    0.76
     raining
    0.73
    strap
    0.73
    ribune
    0.72
    rek
    0.69
    ners
    0.69
    around
    0.67
    Act Density 0.071%

    No Known Activations