INDEX
    Explanations

    occurrences of the word "step"

    New Auto-Interp
    Negative Logits
     Unic
    -0.76
    orem
    -0.75
     Pengu
    -0.74
    ortunately
    -0.72
    ILLE
    -0.69
     tiss
    -0.69
    eatures
    -0.69
    ominated
    -0.67
    essage
    -0.65
    inately
    -0.64
    POSITIVE LOGITS
    daughter
    1.18
    dad
    1.11
    brother
    1.04
    mother
    0.98
    father
    0.98
    hens
    0.89
    isters
    0.88
    steps
    0.88
    mom
    0.84
    step
    0.83
    Act Density 0.017%

    No Known Activations