INDEX
    Explanations

    instances of the word "number" and variations thereof

    New Auto-Interp
    Negative Logits
    olik
    -0.19
     Bever
    -0.17
    Numbers
    -0.17
     numbers
    -0.16
    IED
    -0.16
     Numbers
    -0.16
    numbers
    -0.15
    _numbers
    -0.15
    rag
    -0.14
    alez
    -0.14
    POSITIVE LOGITS
    -one
    0.26
     ones
    0.23
     Ones
    0.21
    -One
    0.21
    -two
    0.21
    ones
    0.20
     One
    0.20
    ONES
    0.20
     two
    0.20
     ONE
    0.20
    Act Density 0.017%

    No Known Activations