INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     in
    -0.09
     in
    -0.08
     despite
    -0.08
    .In
    -0.07
    In
    -0.07
    uell
    -0.07
     In
    -0.07
     numa
    -0.07
     IN
    -0.07
    -in
    -0.06
    POSITIVE LOGITS
     شخصية
    0.07
     Card
    0.06
     aspir
    0.06
    0.06
     Fayette
    0.06
    elenium
    0.06
    GY
    0.06
    "+↵
    0.06
     nombre
    0.05
    font
    0.05
    Act Density 0.120%

    No Known Activations