INDEX
    Explanations

    words related to legal matters and medical conditions

    references and terminology related to gender and social roles

    New Auto-Interp
    Negative Logits
     Kan
    -1.01
     Kap
    -0.99
     Gan
    -0.92
     Kahn
    -0.87
     Mull
    -0.87
     Joan
    -0.86
     Stefan
    -0.86
     UX
    -0.85
     Khan
    -0.84
     Guth
    -0.83
    POSITIVE LOGITS
    ears
    1.14
    ĵ
    0.93
    paralle
    0.93
    orse
    0.91
    orses
    0.87
    ear
    0.87
    oren
    0.86
    inder
    0.86
    ory
    0.85
    eryl
    0.85
    Act Density 0.413%

    No Known Activations