INDEX
    Explanations

    references to minimal quantities or small groups in various contexts

    New Auto-Interp
    Negative Logits
    atk
    -0.07
    heten
    -0.06
     stuff
    -0.06
    357
    -0.06
    izar
    -0.06
    egin
    -0.06
    806
    -0.06
    iram
    -0.06
    efa
    -0.06
     же
    -0.06
    POSITIVE LOGITS
     few
    0.09
     dozen
    0.09
    few
    0.08
    åĩł
    0.08
    åĩłä¸ª
    0.08
    apol
    0.08
     handful
    0.07
     паÑĢÑĥ
    0.07
    lenme
    0.07
    spi
    0.07
    Act Density 0.052%

    No Known Activations