INDEX
    Explanations

    themes related to introspection and self-reflection

    New Auto-Interp
    Negative Logits
    θμ
    -0.19
    iel
    -0.18
    ëĥ¥
    -0.15
    cko
    -0.15
    RGBA
    -0.15
    inka
    -0.15
    aters
    -0.14
    kem
    -0.14
    kud
    -0.14
    align
    -0.14
    POSITIVE LOGITS
     inward
    0.32
    wards
    0.29
    Internal
    0.26
     internal
    0.26
     Inside
    0.26
    åħ§
    0.25
     inside
    0.25
    åĨħ
    0.25
    Inside
    0.25
     outward
    0.24
    Act Density 0.078%

    No Known Activations