INDEX
    Explanations

    references to power dynamics and control over situations or entities

    New Auto-Interp
    Negative Logits
    urat
    -0.16
    inç
    -0.15
    esome
    -0.15
    enburg
    -0.15
    вин
    -0.14
    ůr
    -0.14
    è½½
    -0.14
    URED
    -0.14
    iets
    -0.14
    prit
    -0.13
    POSITIVE LOGITS
     over
    0.56
     sobre
    0.37
    _over
    0.37
     над
    0.35
    over
    0.33
    Over
    0.33
     över
    0.33
     OVER
    0.32
     Over
    0.31
    è¿ĩ
    0.29
    Act Density 0.082%

    No Known Activations