INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    nya
    -0.43
    er
    -0.43
    on
    -0.40
    l
    -0.38
     Korn
    -0.38
     geralmente
    -0.38
    a
    -0.37
    or
    -0.37
    v
    -0.36
    8
    -0.36
    POSITIVE LOGITS
     지
    1.59
    1.48
    1.45
     知
    1.14
     Zhi
    1.13
    Zhi
    1.08
    0.94
    0.88
    지는
    0.87
    지를
    0.86
    Act Density 0.002%

    No Known Activations