INDEX
    Explanations

    references to purpose or intentionality in various contexts

    New Auto-Interp
    Negative Logits
    ish
    -0.18
    áj
    -0.17
    rael
    -0.17
    redo
    -0.16
    å¯
    -0.16
    orna
    -0.16
    ey
    -0.15
    ä
    -0.15
    eding
    -0.15
    roller
    -0.15
    POSITIVE LOGITS
    fully
    0.36
    ful
    0.33
    fulness
    0.27
    FUL
    0.27
    -built
    0.26
    lessly
    0.21
    FULL
    0.18
    full
    0.16
     tw
    0.16
    quoi
    0.16
    Act Density 0.018%

    No Known Activations