INDEX
    Explanations

    instances of the word "for."

    New Auto-Interp
    Negative Logits
    een
    -0.17
     Bram
    -0.15
    tor
    -0.15
    eer
    -0.15
     btw
    -0.15
    annies
    -0.15
     Laurel
    -0.15
    ingly
    -0.14
    ef
    -0.14
     fact
    -0.14
    POSITIVE LOGITS
    izzo
    0.19
    okus
    0.18
    ĵåIJį
    0.16
    gings
    0.15
    گاÙĨ
    0.15
    rer
    0.15
    بس
    0.15
    ác
    0.15
    !!!!↵↵
    0.15
    HeaderInSection
    0.15
    Act Density 0.060%

    No Known Activations