INDEX
    Explanations

    occurrences of the word "for" in various contexts

    New Auto-Interp
    Negative Logits
    amd
    -0.15
    otes
    -0.15
    marshall
    -0.15
    æ®Ĭ
    -0.14
     Hamm
    -0.14
    óm
    -0.14
     Kahn
    -0.14
     always
    -0.14
     even
    -0.14
    f
    -0.14
    POSITIVE LOGITS
    untu
    0.19
    arget
    0.18
    าะ
    0.16
    isman
    0.15
    ASS
    0.15
    داÙħ
    0.14
    iesz
    0.14
    dech
    0.14
    izio
    0.14
    unts
    0.14
    Act Density 0.004%

    No Known Activations