INDEX
    Explanations

    occurrences of the word "one."

    New Auto-Interp
    Negative Logits
     ones
    -0.27
     one
    -0.25
    lant
    -0.18
     One
    -0.17
     Ones
    -0.17
    rd
    -0.17
     ONE
    -0.17
     ÛĮÚ©
    -0.16
     má»Ļt
    -0.16
    ses
    -0.15
    POSITIVE LOGITS
    -third
    0.29
    onta
    0.26
    -half
    0.26
    -way
    0.25
    -dimensional
    0.25
    -sided
    0.24
    /t
    0.23
    ida
    0.22
     particular
    0.22
    -offs
    0.22
    Act Density 0.159%

    No Known Activations