INDEX
    Explanations

    themes related to power dynamics and control in various contexts

    New Auto-Interp
    Negative Logits
    icz
    -0.15
    arios
    -0.14
    vou
    -0.14
    aeper
    -0.14
    shi
    -0.13
    άλι
    -0.13
    EXPORT
    -0.13
    uds
    -0.13
    langs
    -0.13
    848
    -0.12
    POSITIVE LOGITS
     away
    1.77
     Away
    1.54
    away
    1.40
    Away
    1.35
    -away
    1.27
    aways
    0.74
     weg
    0.72
     AW
    0.59
    .aw
    0.53
    awy
    0.47
    Act Density 0.449%

    No Known Activations