INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    u
    1.21
    in
    1.02
    ین
    0.93
    ید
    0.83
    uad
    0.81
    uhi
    0.80
    salir
    0.80
    on
    0.80
    et
    0.80
    un
    0.80
    POSITIVE LOGITS
    (
    0.68
     Festival
    0.67
    ↵↵
    0.64
     Block
    0.64
     Cafe
    0.64
       
    0.64
     blocker
    0.63
    ка
    0.63
     State
    0.62
     DevOps
    0.62
    Act Density 0.001%

    No Known Activations