INDEX
    Explanations

    phrases and terms related to support and communication

    New Auto-Interp
    Negative Logits
     she
    -2.40
    she
    -2.13
    She
    -1.92
     She
    -1.65
    His
    -1.45
     она
    -1.30
    SHE
    -1.29
     SHE
    -1.29
    his
    -1.25
     shes
    -1.20
    POSITIVE LOGITS
     her
    1.12
     herr
    0.56
     HER
    0.53
    PYX
    0.51
     herre
    0.50
     hei
    0.50
     елның
    0.49
     ehr
    0.48
    confirmButton
    0.48
    her
    0.47
    Act Density 0.387%

    No Known Activations