INDEX
    Explanations

    expressions related to gender differences in preferences or behaviors

    intensity or negativity

    New Auto-Interp
    Negative Logits
    ScopeManager
    -0.52
    LabelTagHelper
    -0.52
    momile
    -0.46
     Picchu
    -0.44
     ganchillo
    -0.44
    nestjs
    -0.43
     camry
    -0.43
    zucker
    -0.43
     kasarigan
    -0.42
     Alzheimer
    -0.42
    POSITIVE LOGITS
     violent
    0.93
     aggressive
    0.90
     badass
    0.89
     adrenaline
    0.84
     ferocious
    0.84
     fierce
    0.82
     warlike
    0.81
    violent
    0.79
    aggressive
    0.77
     aggression
    0.76
    Act Density 1.465%

    No Known Activations