INDEX
    Explanations

    references to violent actions and their consequences

    New Auto-Interp
    Negative Logits
    needle
    -0.17
    ends
    -0.16
    ãĥ¼ãĥĹ
    -0.15
    _unpack
    -0.15
    oric
    -0.15
     Minds
    -0.14
    .ERR
    -0.14
    irts
    -0.14
    anax
    -0.14
    allon
    -0.14
    POSITIVE LOGITS
     square
    0.25
     temple
    0.18
     below
    0.18
     BELOW
    0.18
     grazing
    0.17
     solar
    0.17
     across
    0.16
     hard
    0.16
    é¢Ŀ
    0.16
     Square
    0.16
    Act Density 0.047%

    No Known Activations