INDEX
    Explanations

    the presence of the word "at" and related variations in various contexts

    New Auto-Interp
    Negative Logits
    es
    -0.27
    ing
    -0.26
    ed
    -0.23
    hole
    -0.20
    ho
    -0.20
    halt
    -0.20
    hb
    -0.20
    eri
    -0.20
    hoff
    -0.19
    hill
    -0.19
    POSITIVE LOGITS
    ting
    0.27
    tempts
    0.24
    tempt
    0.22
    ernal
    0.21
    lı
    0.20
    tement
    0.20
    URNS
    0.19
    ollah
    0.18
    aylor
    0.18
    sume
    0.18
    Act Density 0.099%

    No Known Activations