INDEX
    Explanations

    terms related to measurement or magnitude across various contexts

    New Auto-Interp
    Negative Logits
    zelf
    -0.19
    ernals
    -0.17
    iates
    -0.16
    est
    -0.16
    sell
    -0.16
    reads
    -0.15
    urer
    -0.14
    nings
    -0.14
    role
    -0.14
    urers
    -0.14
    POSITIVE LOGITS
    -up
    0.24
    -down
    0.23
    able
    0.22
    ToFit
    0.20
    -out
    0.20
    out
    0.20
    tron
    0.19
    way
    0.17
    ardy
    0.17
    azy
    0.17
    Act Density 0.016%

    No Known Activations