INDEX
    Explanations

    phrases related to physically impacting or reaching a target

    occurrences of the word "hit" in various contexts

    New Auto-Interp
    Negative Logits
    pires
    -0.91
    agin
    -0.75
    inent
    -0.71
    æ©Ł
    -0.70
    ç«
    -0.68
    UTH
    -0.66
    cia
    -0.61
    ¥µ
    -0.60
    algia
    -0.60
    otive
    -0.59
    POSITIVE LOGITS
    ched
    1.23
    ches
    0.93
    boxes
    0.86
    ted
    0.82
    tle
    0.81
    chens
    0.79
    achi
    0.79
    box
    0.78
    ting
    0.77
    ters
    0.74
    Act Density 0.027%

    No Known Activations