INDEX
    Explanations

    occurrences of the number "two" in different contexts

    instances of the word "two"

    New Auto-Interp
    Negative Logits
    ugu
    -0.76
    asta
    -0.72
    awaru
    -0.72
    amaru
    -0.69
    ubs
    -0.69
    Fed
    -0.67
    rir
    -0.66
    ysical
    -0.66
    aukee
    -0.63
    uffer
    -0.63
    POSITIVE LOGITS
     thirds
    1.42
     dozen
    1.00
     halves
    0.98
     weeks
    0.95
     hundred
    0.88
    teen
    0.88
    een
    0.88
    fold
    0.86
    thirds
    0.86
     decades
    0.81
    Act Density 0.075%

    No Known Activations