INDEX
    Explanations

    concepts related to freedom or lack of constraints

    New Auto-Interp
    Negative Logits
    ãĥ³ãĤ°
    -0.17
    uality
    -0.16
    lou
    -0.16
    riad
    -0.16
    py
    -0.16
    la
    -0.15
    lah
    -0.15
    aul
    -0.15
    st
    -0.15
    nya
    -0.15
    POSITIVE LOGITS
    bies
    0.34
    bie
    0.32
    -floating
    0.28
     lance
    0.26
    zers
    0.26
    bsd
    0.25
    -wheel
    0.25
    -flow
    0.24
    zed
    0.24
    -standing
    0.24
    Act Density 0.065%

    No Known Activations