INDEX
    Explanations

    concepts relating to reflection and introspection

    New Auto-Interp
    Negative Logits
    енка
    -0.07
    nesc
    -0.07
    terra
    -0.07
    ephy
    -0.07
    arkin
    -0.07
    ovnÃŃ
    -0.07
    mods
    -0.07
    ymes
    -0.07
     æ¯
    -0.07
    tera
    -0.06
    POSITIVE LOGITS
     reflection
    0.14
     Reflection
    0.13
     reflections
    0.13
    reflection
    0.13
     mirrors
    0.13
     mirror
    0.13
    Reflection
    0.13
     Mirror
    0.12
     reflected
    0.12
     reflect
    0.11
    Act Density 0.021%

    No Known Activations