INDEX
    Explanations

    instances of the word "control" in relation to power dynamics

    New Auto-Interp
    Negative Logits
    ibal
    -0.16
    éĽ
    -0.14
    ussen
    -0.14
    ornings
    -0.14
    allen
    -0.14
    EDIATE
    -0.14
    æĮĻ
    -0.14
    oris
    -0.14
    arium
    -0.14
    åĪ·
    -0.13
    POSITIVE LOGITS
    iser
    0.16
     Platt
    0.15
    ufe
    0.14
    彦
    0.14
    /browse
    0.14
    ervo
    0.14
    lun
    0.14
    nde
    0.13
    705
    0.13
    imb
    0.13
    Act Density 0.034%

    No Known Activations