INDEX
    Explanations

    references to the concept of "meaning" in various contexts

    New Auto-Interp
    Negative Logits
    eday
    -0.16
    lush
    -0.15
    uter
    -0.14
    icle
    -0.14
    erty
    -0.14
    bury
    -0.14
    ideshow
    -0.14
    aura
    -0.14
    ap
    -0.13
    urch
    -0.13
    POSITIVE LOGITS
    fully
    0.29
    FUL
    0.24
    ful
    0.23
    lessly
    0.21
    fulness
    0.19
    iful
    0.18
    lessness
    0.18
    nes
    0.17
       
    0.17
     behind
    0.15
    Act Density 0.023%

    No Known Activations