INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    undry
    -0.19
    anson
    -0.17
    alon
    -0.15
    alian
    -0.15
    amax
    -0.15
    alam
    -0.15
    urent
    -0.14
    TAG
    -0.14
     rope
    -0.14
    ipv
    -0.14
    POSITIVE LOGITS
    ess
    0.31
    esses
    0.31
     cub
    0.26
     lion
    0.24
    ardo
    0.23
     Lion
    0.22
     mane
    0.22
     Cub
    0.20
     lions
    0.20
    ESS
    0.19
    Act Density 0.007%

    No Known Activations