INDEX
    Explanations

    instances of the word "when."

    New Auto-Interp
    Negative Logits
    them
    -0.19
    ise
    -0.17
    orsi
    -0.15
     tieten
    -0.15
    unately
    -0.15
     herself
    -0.15
    ize
    -0.14
     пока
    -0.14
    ly
    -0.14
    ally
    -0.14
    POSITIVE LOGITS
    /if
    0.47
    soever
    0.43
    EVER
    0.33
     they
    0.32
     faced
    0.31
     asked
    0.29
     it
    0.29
     we
    0.28
    /how
    0.28
     compared
    0.27
    Act Density 0.138%

    No Known Activations