INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     and
    -1.70
     arid
    -1.62
        
    -1.52
    -1.49
     také
    -1.45
     antes
    -1.42
     he
    -1.41
     <
    -1.41
     et
    -1.39
     eres
    -1.36
    POSITIVE LOGITS
    they
    1.91
    1.85
     unangemess
    1.75
     élector
    1.68
    💕
    1.65
    🫣
    1.65
    3
    1.64
    1.63
     Berücksich
    1.61
     distinguer
    1.58
    Act Density 0.035%

    No Known Activations