INDEX
    Explanations
    New Auto-Interp
    Head Attr Weights
    0:0.08
    1:0.08
    2:0.07
    3:0.08
    4:0.07
    5:0.07
    6:0.07
    7:0.09
    8:0.07
    9:0.08
    10:0.09
    11:0.09
    Negative Logits
     fem
    -2.93
    sing
    -2.84
     menstru
    -2.75
    sexual
    -2.64
     noun
    -2.61
     femin
    -2.57
     vowel
    -2.54
     maternal
    -2.50
     Saiyan
    -2.50
     pronouns
    -2.46
    POSITIVE LOGITS
     Marlins
    2.96
     Moz
    2.87
    kefeller
    2.79
     Oman
    2.62
     Zot
    2.61
     Moroc
    2.53
     Anon
    2.50
    Phill
    2.48
     Tripoli
    2.48
    untled
    2.47
    Act Density 0.000%

    No Known Activations