INDEX
    Explanations

    words ending with "owers"

    phrases related to power dynamics

    references to power dynamics and authority figures

    New Auto-Interp
    Negative Logits
    ERAL
    -0.72
    âĸ¬
    -0.69
    Philipp
    -0.69
    ר
    -0.65
    ׾
    -0.64
    ric
    -0.64
    cs
    -0.64
    Condition
    -0.64
    Pacific
    -0.63
    à©
    -0.63
    POSITIVE LOGITS
    chwitz
    0.96
    hops
    0.96
    peed
    0.93
    ynthesis
    0.92
    kinson
    0.91
    pace
    0.89
    ktop
    0.88
    hift
    0.88
    uits
    0.87
    creen
    0.83
    Act Density 0.008%

    No Known Activations