INDEX
    Explanations

    references to the concept of "one."

    New Auto-Interp
    Negative Logits
     ones
    -0.25
     one
    -0.20
    lant
    -0.18
     Ones
    -0.18
    rd
    -0.18
    mente
    -0.17
    land
    -0.17
    se
    -0.17
    nya
    -0.17
    th
    -0.16
    POSITIVE LOGITS
    -third
    0.31
    onta
    0.29
    -way
    0.26
    -half
    0.26
    -dimensional
    0.25
    -sided
    0.25
    /t
    0.24
     particular
    0.23
    -two
    0.23
    -stop
    0.23
    Act Density 0.163%

    No Known Activations