INDEX
    Explanations

    references to the number two and its variations in different contexts

    New Auto-Interp
    Negative Logits
     of
    -0.17
     among
    -0.17
     fewer
    -0.17
    inding
    -0.15
    both
    -0.15
    Laughs
    -0.14
    among
    -0.14
    1
    -0.14
     amongst
    -0.14
    ena
    -0.13
    POSITIVE LOGITS
     remaining
    0.26
    remaining
    0.23
     aforementioned
    0.22
    aviest
    0.21
     newest
    0.21
    latest
    0.21
     latest
    0.20
    Remaining
    0.18
    fold
    0.18
     Remaining
    0.18
    Act Density 0.123%

    No Known Activations