INDEX
    Explanations

    mentions of the number "two."

    the word "two" in various contexts

    New Auto-Interp
    Negative Logits
    ugu
    -0.80
    asta
    -0.74
    amaru
    -0.71
    ubs
    -0.70
    rolet
    -0.69
     Caption
    -0.68
    rir
    -0.68
    aukee
    -0.66
    iggins
    -0.66
    awaru
    -0.64
    POSITIVE LOGITS
     thirds
    1.50
     halves
    1.06
     dozen
    1.02
     weeks
    1.02
     hundred
    0.95
    fold
    0.94
    teen
    0.93
     decades
    0.90
    een
    0.86
    teenth
    0.86
    Act Density 0.103%

    No Known Activations