INDEX
    Explanations

    expressions of self-awareness and identity

    New Auto-Interp
    Negative Logits
    ules
    -0.17
    adle
    -0.17
    ÑĢави
    -0.15
     tail
    -0.14
     forces
    -0.14
    æķ¬
    -0.13
     Rouge
    -0.13
    ãĥĵãĥ¼
    -0.13
     Tail
    -0.13
     intermediate
    -0.13
    POSITIVE LOGITS
     Workbook
    0.21
     ego
    0.21
     Christ
    0.17
     Projection
    0.16
     hol
    0.16
     Perception
    0.16
     brothers
    0.16
    compileComponents
    0.16
    projection
    0.16
     Reality
    0.16
    Act Density 0.002%

    No Known Activations