INDEX
    Explanations

    phrases related to decisive or impactful actions

    instances of the word "the" in various contexts

    New Auto-Interp
    Negative Logits
    ttp
    -0.80
    ornings
    -0.76
    HAEL
    -0.76
    ossal
    -0.75
    -->
    -0.75
    ogyn
    -0.74
    deen
    -0.73
    nces
    -0.72
    >:
    -0.69
    oine
    -0.69
    POSITIVE LOGITS
     envelope
    1.25
     blame
    1.23
     brakes
    1.22
     needle
    1.13
     ball
    1.07
     curtain
    1.03
     lid
    1.02
     screws
    1.02
     reins
    0.99
     curtains
    0.97
    Act Density 0.161%

    No Known Activations