INDEX
    Explanations

    phrases with the word "two"

    references to the number two

    New Auto-Interp
    Negative Logits
    ugu
    -0.77
    asta
    -0.75
    atown
    -0.72
    iggins
    -0.68
    annel
    -0.68
    ubs
    -0.68
    ushima
    -0.67
    yz
    -0.66
    renheit
    -0.66
    ovi
    -0.66
    POSITIVE LOGITS
     halves
    1.23
     thirds
    1.19
    fold
    1.02
     sexes
    0.95
     sides
    0.89
     dozen
    0.88
    teenth
    0.85
     Kore
    0.84
     main
    0.80
     hundred
    0.79
    Act Density 0.054%

    No Known Activations