INDEX
    Explanations

    instances of the word "one"

    New Auto-Interp
    Negative Logits
    thane
    -0.16
    mouseup
    -0.16
    nds
    -0.15
    hee
    -0.15
    arken
    -0.15
    anca
    -0.14
    ouce
    -0.14
    mini
    -0.14
    tems
    -0.14
    ouple
    -0.14
    POSITIVE LOGITS
     among
    0.32
     amongst
    0.27
    among
    0.24
     Among
    0.23
    Among
    0.19
    -third
    0.19
     of
    0.17
    -half
    0.17
     leg
    0.16
     the
    0.16
    Act Density 0.045%

    No Known Activations