INDEX
    Explanations

    terms related to philosophical concepts

    New Auto-Interp
    Negative Logits
    e
    -0.41
    t
    -0.32
    eck
    -0.30
    i
    -0.29
    eh
    -0.28
    eut
    -0.28
    eel
    -0.28
    ebo
    -0.28
    s
    -0.27
    eam
    -0.27
    POSITIVE LOGITS
    aurus
    0.35
    copy
    0.34
    hiba
    0.32
    patial
    0.29
    keleton
    0.29
    otros
    0.26
    ynthesis
    0.25
    ystem
    0.25
    ystems
    0.24
    ocial
    0.24
    Act Density 0.027%

    No Known Activations