INDEX
    Explanations

    occurrences of the word "that."

    New Auto-Interp
    Negative Logits
    ABCDE
    -0.15
    ogue
    -0.15
    rech
    -0.15
    tlement
    -0.15
     Sel
    -0.15
    tero
    -0.14
    ottage
    -0.14
    artz
    -0.14
    _drawer
    -0.13
    ngen
    -0.13
    POSITIVE LOGITS
    zeit
    0.15
    raft
    0.15
    istrovstvÃŃ
    0.14
    dash
    0.14
     Osw
    0.14
    dar
    0.14
    å¸ĸ
    0.14
    ests
    0.14
    -fw
    0.13
    eed
    0.13
    Act Density 0.096%

    No Known Activations