INDEX
    Explanations

    instances of the word "it"

    New Auto-Interp
    Negative Logits
    oped
    -0.18
    izzle
    -0.16
    quist
    -0.16
    ward
    -0.15
     Friedman
    -0.15
     Strauss
    -0.15
    ersh
    -0.14
    æIJ¬
    -0.14
    chn
    -0.14
    ÎķÎł
    -0.13
    POSITIVE LOGITS
    nier
    0.16
    ron
    0.14
    erm
    0.14
    hani
    0.14
    вол
    0.14
    úsqueda
    0.14
    нÑĸм
    0.14
    екÑĤи
    0.14
    ipa
    0.14
    ivec
    0.13
    Act Density 0.018%

    No Known Activations