INDEX
    Explanations

    instances of the word "two" or its numeric representation

    New Auto-Interp
    Negative Logits
    bast
    -0.16
    imer
    -0.15
     ones
    -0.14
    erm
    -0.14
    imers
    -0.14
    laus
    -0.14
    stå
    -0.14
    ertas
    -0.13
    hausen
    -0.13
    umer
    -0.13
    POSITIVE LOGITS
    -thirds
    0.29
    -dimensional
    0.25
     dozen
    0.24
    gether
    0.23
    ième
    0.22
    /th
    0.21
    nd
    0.20
    -way
    0.19
    -fold
    0.18
    -sided
    0.17
    Act Density 0.120%

    No Known Activations