INDEX
    Explanations

    instances of the word "in" indicating locations or contexts within the text

    New Auto-Interp
    Negative Logits
    oref
    -0.15
    ched
    -0.15
    nero
    -0.15
    forth
    -0.15
    roj
    -0.15
    ovation
    -0.14
    heart
    -0.14
    /of
    -0.14
    hart
    -0.14
    ires
    -0.14
    POSITIVE LOGITS
    676
    0.16
    729
    0.15
    677
    0.15
    679
    0.15
    ÅŁa
    0.15
    lä
    0.15
    sbin
    0.15
     early
    0.14
    791
    0.14
    697
    0.14
    Act Density 0.073%

    No Known Activations