INDEX
    Explanations

    specific phrases and word combinations that don't seem to follow grammatical or contextual rules

    expressions that indicate societal attitudes towards gender roles

    New Auto-Interp
    Negative Logits
    isSpecialOrderable
    -0.72
    é¾įå¥ij士
    -0.70
    cture
    -0.69
    onymous
    -0.65
     successor
    -0.64
    ãĥ¯ãĥ³
    -0.62
    orney
    -0.61
    çIJ
    -0.60
    manent
    -0.60
    mere
    -0.60
    POSITIVE LOGITS
     deserve
    0.92
     rejoice
    0.92
     alike
    0.86
     behave
    0.86
     prefer
    0.86
     instinctively
    0.85
     notoriously
    0.84
     differ
    0.83
     thrive
    0.82
     flock
    0.82
    Act Density 0.444%

    No Known Activations