INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     surname
    -0.09
    Nationality
    -0.08
     bride
    -0.08
     nationals
    -0.08
    Surname
    -0.08
    surname
    -0.08
    Maze
    -0.08
     crane
    -0.08
     wass
    -0.08
     infringement
    -0.08
    POSITIVE LOGITS
    optimized
    0.12
     optimized
    0.10
    优化
    0.10
     optimize
    0.10
     optimizing
    0.09
     otim
    0.09
     optim
    0.09
     stripe
    0.09
    默认
    0.09
     inteligência
    0.09
    Act Density 0.005%

    No Known Activations