INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ixen
    -0.09
     교수
    -0.08
    ynomial
    -0.08
     Professor
    -0.08
     *↵//
    -0.07
    -0.07
    Ingen
    -0.07
    -In
    -0.07
    ician
    -0.07
     agence
    -0.07
    POSITIVE LOGITS
     rhetoric
    0.08
     religieux
    0.08
     mascul
    0.07
     tinder
    0.07
     Bou
    0.07
     religiosos
    0.07
     Jewish
    0.07
     bou
    0.07
     bénévol
    0.07
     violences
    0.07
    Act Density 0.011%

    No Known Activations