INDEX
    Explanations

    references to the word "her" and its variations

    New Auto-Interp
    Negative Logits
     Monfieur
    -0.87
     raiſ
    -0.86
     Houſe
    -0.82
     uſe
    -0.78
     Оно
    -0.77
     purpoſe
    -0.74
     Efq
    -0.74
     cauſe
    -0.73
     pleaſure
    -0.72
     uſed
    -0.70
    POSITIVE LOGITS
     her
    3.01
    her
    2.01
     Her
    1.89
     HER
    1.89
    Her
    1.84
     his
    1.68
    她的
    1.62
     hennes
    1.57
    HER
    1.53
     hers
    1.50
    Act Density 0.056%

    No Known Activations