INDEX
    Explanations

    the word "one" and its variations, indicating a focus on singularity or emphasis on individual instances

    New Auto-Interp
    Negative Logits
    atile
    -0.14
    pliers
    -0.13
    TRL
    -0.13
    /from
    -0.13
     Prefer
    -0.13
     ÑģобÑĸ
    -0.12
    piler
    -0.12
    ched
    -0.12
    -Identifier
    -0.12
    dle
    -0.12
    POSITIVE LOGITS
     advantage
    0.23
     of
    0.22
     such
    0.22
     consequence
    0.21
     benefit
    0.21
     reason
    0.21
     thing
    0.20
     drawback
    0.20
     difficulty
    0.20
     wonders
    0.20
    Act Density 0.058%

    No Known Activations