INDEX
    Explanations

    instances of the word "two" in various contexts

    New Auto-Interp
    Negative Logits
     Perfect
    -0.53
    Perfect
    -0.50
     perfect
    -0.50
    head
    -0.49
    perfect
    -0.49
     perfection
    -0.49
    perf
    -0.47
    antian
    -0.46
     Cavendish
    -0.46
    ling
    -0.45
    POSITIVE LOGITS
     two
    1.49
    two
    1.35
     Two
    1.27
    Two
    1.27
    TWO
    1.19
     TWO
    1.13
     двух
    1.13
     deux
    1.12
     zwei
    1.10
     berdua
    1.05
    Act Density 0.259%

    No Known Activations