INDEX
    Explanations

    references to the word "one"

    New Auto-Interp
    Negative Logits
     iconFacebook
    -0.69
     }}{\
    -0.63
    \{\\
    -0.61
    zbęd
    -0.61
     underlying
    -0.60
    UnsafeEnabled
    -0.59
     odkazy
    -0.59
    underlying
    -0.58
     كومونز
    -0.58
    θα
    -0.57
    POSITIVE LOGITS
     ones
    1.20
     Ones
    0.86
    Ones
    0.74
    のもの
    0.71
    するもの
    0.64
     ours
    0.63
    InitVars
    0.62
     counterparts
    0.62
    したもの
    0.59
    іга
    0.58
    Act Density 0.231%

    No Known Activations