INDEX
    Explanations

    the word "when" followed by a numerical value

    New Auto-Interp
    Negative Logits
    enegger
    -0.74
    kaya
    -0.71
    gur
    -0.69
    zzi
    -0.68
    yi
    -0.66
    feature
    -0.65
    edly
    -0.62
    hid
    -0.61
    hire
    -0.61
    chens
    -0.61
    POSITIVE LOGITS
    soever
    1.04
     exactly
    0.95
    abouts
    0.79
    ce
    0.79
    irlf
    0.78
     they
    0.73
     someone
    0.71
     puberty
    0.65
     we
    0.65
     faced
    0.65
    Act Density 0.078%

    No Known Activations