INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ]:↵
    -0.08
     Navigate
    -0.06
     ContentType
    -0.06
    -0.06
    👆
    -0.06
    assuming
    -0.06
     Hit
    -0.06
     amazed
    -0.06
    Expected
    -0.06
    _readable
    -0.06
    POSITIVE LOGITS
     Leon
    0.07
     coats
    0.07
     włos
    0.07
     elegant
    0.07
     Scalars
    0.07
    на
    0.06
    .tab
    0.06
     villagers
    0.06
     rout
    0.06
     denen
    0.06
    Act Density 0.004%

    No Known Activations